You shall know a word by the company it keeps
|Datasets:||Frankenstein, AstroBlackness, WikiHarass, Learning from Deep Learning, nearbySaussure|
|Developed by:||Google Tensorflow's word2vec, Algolit|
You shall know a word by the company it keeps is a series of 5 landscapes that are based on different datasets. Each landscape includes the words 'human', 'learning', 'system' in company of different semantic clusters. The belief that distances in the graph are connected to semantic similarity of words, is one of the basic ideas behind word2vec.
The graphs are the result of a code study based on an existing word-embedding tutorial script word2vec_basic.py. In a machine learning practise, graphs like these function as one of the validation tools to see if a model starts to make sense. It is interesting how this validation process is fuelled by individual semantic understanding of the clusters and the words.
How can we use these semantic landscapes as reading tools?
- Vocabulary size: 5000
- Algorithm: Adam Optimizer
- Learning rate: 0.01
graph 1: Frankenstein
Includes the book Frankenstein or, The Modern Prometheus by Mary Shelly.
loss value: 4.45983128536 Nearest to human: fair, active, crevice, sympathizing, pretence, fellow, nightingale, productions, deaths, medicine, Nearest to learning: steeple, clump, electricity, security, foretaste, fluctuating, finding, gazes, pour, decides, Nearest to system: philosophy, coincidences, threatening, selfcontrol, distinctly, babe, stream, chimney, recess, accounts,
graph 2: AstroBlackness
A selection of texts from an afrofuturist perspective.
loss value: 5.8195698024 Nearest to human: black, difference, white, gender, otherwise, 3, 7, ignorance, contemporary, greater, Nearest to learning: superior, truth, function, lens, start, dying, existence, changing, symbol, place, Nearest to system: attempts, adapt, programmed, varieties, limit, realization, color, promise, population, voice,
graph 3: nearbySaussure
Includes three secondary books about Saussure's work in structuralist linguistics.
loss value: 5.78265964687 Nearest to human: cultural, 181, psychic, Human, rational, physical, story, chance, domain, furthermore, Nearest to system: structure, content, community, System, term, center, study, plurality, form, value, The word 'learning' did not appear in the list of 5000 most common words.
graph 4: Learning from Deep Learning
Includes seven text books on the topic of deep learning.
loss value: 6.65393904257 Nearest to human: healthy, given, modeling, poorly, inspired, criterion, specifically, Accuracy, surface, predicting, Nearest to learning: Learning, pretrained, sparse, neat, 21, inference, tuning, adagrad, tested, Use, Nearest to system: UNK, roi, dataframe, code, win, page, approach, diagonal, cae, letter,
graph 5: WikiHarass
Includes examples of harassment on Talk page comments from Wikipedia.
loss value: 3.93717244664 Nearest to human: jacob, Persianyes, phrase, track, star, attack, puts, jews, helps, plastic, Nearest to learning: sound, people, getting, writing, thinking, talking, thoughts, modify, less, prince, Nearest to system: armenian, UNK, georgia, george, n, developed, its, each, daniele, claim,