Topic Modeling Workshop

January 1, 1970January 1, 1970

University of Maryland, College Park

Topic Modeling for Humanities Research, a one-day workshop directed by Assistant Director of MITH Dr. Jennifer Guiliano, received a Level 1 Digital Humanities start up from the National Endowment for the Humanities on April 19, 2011. The workshop will facilitate a unique opportunity for cross-fertilization, information exchange, and collaboration between and among humanities scholars and researchers in natural language processing on the subject of topic modeling applications and methods. The workshop will be organized into three primary areas: 1) an overview of how topic modeling is currently being used in the humanities; 2) an inventory of extensions of the LDA model that have particular relevance for humanities research questions; and 3) a discussion of software implementations, toolkits, and interfaces. Despite—or perhaps because of—the relatively widespread use of topic modeling for text analysis in the digital humanities, it is common to find examples of misapplication and misinterpretation of the technique and its output. There are a number of reasons for this: existing software packages generally have a significant learning curve, most humanists do not have a clear understanding of the underlying statistical methods and models, and there is still limited documentation of best practices for the application of the methods to humanities research questions. As a result, the most promising work in topic modeling is being done not by humanists exploring literary or historical corpora but instead by scholars working in natural language processing and information retrieval. This workshop will address these issues by providing an opportunity for humanists and scholars working in natural language processing jointly to identify potential areas of research and development within applications, extensions, and implementation of topic modeling. Topic Modeling in the Humanities will provide humanities scholars with a deeper understanding of the vocabulary of LDA topic modeling (and other latent variable modeling methods) and best practices for interpreting the output of such analysis, and will articulate fundamental literary and historical questions for researchers outside of the humanities who are developing the models and methods (as well as the software implementations).