This post begins a review of “Automatic Discovery of Similar Words,” by Pierre Senellart and Vincent D. Blondel, published as Chapter 2 in Berry and Castellanos’ Survey of Text Mining II.
This is an excellent and useful chapter, in that it:
1) Addresses the broad issue of computational methods for discovering “similar words” (including synonyms, near-synonyms, and thesauri-generating techniques) from large data corpora,
2) Illustrates the different leading mathematical methods, giving an excellent overview of the SoA,
3) Competently discusses how different methods perform in domain-specific vs. broad corpora, and also addresses related methods for determining semantic similarity and/or distance.
Their unique contribution is the dictionary graph, which uses a graph-based method. This is particularly interesting and useful, as graph-theory methods are growing in importance.
Time constraints require that this review be done in multiple posts. More tomorrow.