"Automatic Discovery of Similar Words" – Chapter 2 in Survey of Text Mining II

"Automatic Discovery of Similar Words" – Chapter 2 in Survey of Text Mining II

This post begins a review of “Automatic Discovery of Similar Words,” by Pierre Senellart and Vincent D. Blondel, published as Chapter 2 in Berry and Castellanos’ Survey of Text Mining II.

This is an excellent and useful chapter, in that it:
1) Addresses the broad issue of computational methods for discovering “similar words” (including synonyms, near-synonyms, and thesauri-generating techniques) from large data corpora,
2) Illustrates the different leading mathematical methods, giving an excellent overview of the SoA,
3) Competently discusses how different methods perform in domain-specific vs. broad corpora, and also addresses related methods for determining semantic similarity and/or distance.

Their unique contribution is the dictionary graph, which uses a graph-based method. This is particularly interesting and useful, as graph-theory methods are growing in importance.

Time constraints require that this review be done in multiple posts. More tomorrow.

Leave a Reply

Your email address will not be published. Required fields are marked *