July 2015 – Joshua Tan

Back in January, Yiannis (Vlassopoulos) and I were talking about “quadratic relations” and higher concepts in language, for example the analogy between “(king, queen)” and “(man, woman)”. Deep neural networks can learn such relations from a set of natural language texts, called a corpus. But there are other ways of learning such relations:

Representation of corpus	How to learn / compute
n-grams	string matching
word embeddings	standard algorithms, e.g. neural nets
presyntactic category	computational category theory
bar construction	Koszul dual
formal concept lattice	TBD

There are some very nice connections between all five representations. In a way, they’re all struggling to get away from the raw, syntactic, “1-dimensional” data of word co-location to something higher-order, something semantic. (For example, “royal + woman = queen” is semantic; “royal + woman = royal woman” is not.) I’d like to tell a bit of that story here.

Continue reading →