Browsed by
Category: Taxonomy

Novelty Detection in Text Corpora

Novelty Detection in Text Corpora

Detecting Novelty Using Text Analytics Detecting novel events – new words, meaning new events – is one of the most important text analytics tasks, and is an important step towards predictive analytics using text mining. On July 24, 2015, The New York Times (and many other news sources) published an article identifying potential inclusion of classified information in the emails which Hillary Clinton had sent via private email and stored on her private email server. How would we use text…

Read More Read More

Chapter 2 Review, Continued, Part 2 — "Automatic Discovery of Similar Words"

Chapter 2 Review, Continued, Part 2 — "Automatic Discovery of Similar Words"

(Direct continuation of yesterday’s post, w/r/t Senellart & Blondel on “Automatic Discovery of Similar Words” in Survey of Text Mining II. I give the references that cite, which I discuss in this post, at the end of the post.) In Chapter 2’s revieww of previous methods and associated literature, Senellart & Blondel start with banal and get progressively more interesting. The one thing I found interesting in the first model that Senellart and Blondel discussed was that the model was…

Read More Read More

"Automatic Discovery of Similar Words" – Chapter 2 in Survey of Text Mining II

"Automatic Discovery of Similar Words" – Chapter 2 in Survey of Text Mining II

This post begins a review of “Automatic Discovery of Similar Words,” by Pierre Senellart and Vincent D. Blondel, published as Chapter 2 in Berry and Castellanos’ Survey of Text Mining II. This is an excellent and useful chapter, in that it:1) Addresses the broad issue of computational methods for discovering “similar words” (including synonyms, near-synonyms, and thesauri-generating techniques) from large data corpora,2) Illustrates the different leading mathematical methods, giving an excellent overview of the SoA,3) Competently discusses how different methods…

Read More Read More