Topic Modeling Workshop

Topic Modeling Workshop

Topic Modeling for Humanities Research, a one-day workshop directed by Assistant Director of MITH Dr. Jennifer Guiliano, received a Level 1 Digital Humanities start up from the National Endowment for the Humanities on April 19, 2011. The workshop will facilitate a unique opportunity for cross-fertilization, information exchange, and collaboration between and among humanities scholars and researchers in natural language processing on the subject of topic modeling applications and methods. The workshop will be organized into three primary areas: 1) an overview of how topic modeling is currently being used in the humanities; 2) an inventory of extensions of the LDA model that have particular relevance for humanities research questions; and 3) a discussion of software implementations, toolkits, and interfaces. Despite—or perhaps because of—the relatively widespread use of topic modeling for text analysis in the digital humanities, it is common to find examples of misapplication and misinterpretation of the technique and its output. There are a number of reasons for this: existing software packages generally have a significant learning curve, most humanists do not have a clear understanding of the underlying statistical methods and models, and there is still limited documentation of best practices for the application of the methods to humanities research questions. As a result, the most promising work in topic modeling is being done not by humanists exploring literary or historical corpora but instead by scholars working in natural language processing and information retrieval. This workshop will address these issues by providing an opportunity for humanists and scholars working in natural language processing jointly to identify potential areas of research and development within applications, extensions, and implementation of topic modeling. Topic Modeling in the Humanities will provide humanities scholars with a deeper understanding of the vocabulary of LDA topic modeling (and other latent variable modeling methods) and best practices for interpreting the output of such analysis, and will articulate fundamental literary and historical questions for researchers outside of the humanities who are developing the models and methods (as well as the software implementations).

Speakers

Matthew Jockers
Department of English and Center for Digital Research in the HumanitiesUniversity of Nebraska
Robert K. Nelson
Assistant Professor of the Digital Scholarship LabUniversity of Richmond
Jordan  Boyd-Graber
Jordan Boyd-Graber
Assistant ProfessorSchool of Information Studies and Institute for Advanced Computer Studies (UMIACS)University of Maryland
Jo Guldi
Department of HistoryBrown University
Christopher Johnson-Roberson
EthnomusicologyBrown University
David  Mimno
David Mimno
Postdoctoral ResearcherPrinceton University