You shall know a word by the company it keeps

From Algolit

Type: Algoliterary exploration
Datasets: Frankenstein, AstroBlackness, WikiHarass, Learning from Deep Learning, nearbySaussure
Technique: word embeddings
Developed by: Google Tensorflow's word2vec, Algolit

You shall know a word by the company it keeps is a series of 5 landscapes that are based on different datasets. Each landscape includes the words 'human', 'learning', 'system' in company of different semantic clusters. The belief that distances in the graph are connected to semantic similarity of words, is one of the basic ideas behind word2vec.

The graphs are the result of a code study based on an existing word-embedding tutorial script In a machine learning practise, graphs like these function as one of the validation tools to see if a model starts to make sense. It is interesting how this validation process is fuelled by individual semantic understanding of the clusters and the words.

How can we use these semantic landscapes as reading tools?


graph 1: Frankenstein

Includes the book Frankenstein or, The Modern Prometheus by Mary Shelly.

loss value: 4.45983128536
Nearest to human: fair, active, crevice, sympathizing, pretence, fellow, nightingale, productions, deaths, medicine,
Nearest to learning: steeple, clump, electricity, security, foretaste, fluctuating, finding, gazes, pour, decides,
Nearest to system: philosophy, coincidences, threatening, selfcontrol, distinctly, babe, stream, chimney, recess, accounts,

Detail-frankenstein.png 5 graphs frankenstein gutenberg tf.png

graph 2: AstroBlackness

A selection of texts from an afrofuturist perspective.

loss value: 5.8195698024
Nearest to human: black, difference, white, gender, otherwise, 3, 7, ignorance, contemporary, greater,
Nearest to learning: superior, truth, function, lens, start, dying, existence, changing, symbol, place,
Nearest to system: attempts, adapt, programmed, varieties, limit, realization, color, promise, population, voice,

Detail-astroBlackness.png 5 graphs astroBlackness.png

graph 3: nearbySaussure

Includes three secondary books about Saussure's work in structuralist linguistics.

loss value: 5.78265964687
Nearest to human: cultural, 181, psychic, Human, rational, physical, story, chance, domain, furthermore,
Nearest to system: structure, content, community, System, term, center, study, plurality, form, value,

The word 'learning' did not appear in the list of 5000 most common words.

Detail-nearbySaussure.png 5 graphs nearbySaussure.png

graph 4: Learning from Deep Learning

Includes seven text books on the topic of deep learning.

loss value: 6.65393904257
Nearest to human: healthy, given, modeling, poorly, inspired, criterion, specifically, Accuracy, surface, predicting,
Nearest to learning: Learning, pretrained, sparse, neat, 21, inference, tuning, adagrad, tested, Use,
Nearest to system: UNK, roi, dataframe, code, win, page, approach, diagonal, cae, letter,

Detail-learning-deep-learning.png 5 graphs deep-learning-trainingset.png

graph 5: WikiHarass

Includes examples of harassment on Talk page comments from Wikipedia.

loss value: 3.93717244664
Nearest to human: jacob, Persianyes, phrase, track, star, attack, puts, jews, helps, plastic,
Nearest to learning: sound, people, getting, writing, thinking, talking, thoughts, modify, less, prince,
Nearest to system: armenian, UNK, georgia, george, n, developed, its, each, daniele, claim,

Detail-WikiHarass.png 5 graphs Talk page comments from Wikipedia stripped.png