Travis Brown Digital Dialogue

January 1, 1970January 1, 1970

MITH Conference Room

Many popular natural language processing techniques and tools rely on annotated training corpora to learn models that can be used to process new data from a similar domain. We can train a parser on Wall Street Journal text from the Penn Treebank, for example, and expect it to perform reasonably well on recent blog posts or movie reviews, but not necessarily on eighteenth-century conduct manuals. Unfortunately it's often hard to find or create appropriate training data for specific literary genres or historical periods, even in English. In this talk Travis Brown, Assistant Director of Research and Development at MITH, will look at some examples of semi-supervised and unsupervised methods that can be used to explore large text collections in domains with little or no available training data.