Large Scale Text Analysis in the Digital Humanities: Methods and Challenges
To tackle increasingly large digitized archives of text, the digital humanities community has responded with an avid interest in text mining and visualization. Everywhere one looks these days, computer scientists are bringing text analysis to humanities scholars with tutorials, workshops, and toolkits. Nevertheless, crucial information is being lost in translation. If text analysis toolkits are to be truly successful, information needs to start flowing the other way and computer scientists must learn from humanities scholars what humanistic text analysis really means. If not, they will continue making "natural" assumptions that do not always translate into the humanities. For example, concepts like "question", "hypothesis", "data", "evidence" are always well-defined in scholars' minds and are universal to all analysis. In the extreme case, this misalignment of basic assumptions could lead to fleets of powerful text analysis tools that nobody knows how to actually apply to humanistic analysis. In this talk, Aditi Muralidharan, Ph.D. Candidate in Computer Science at UC Berkeley, will describe her experiences collaborating with English scholars to build the NEH-funded WordSeer text analysis toolkit, and discuss differences between the ways that computer scientists and humanities scholars view text analysis, and ways in which communication between the two fields can be improved.
Speakers
Aditi Muralidharan is a Ph.D. candidate within the Department of Computer Science at the University of California, Berkeley. She builds and researches systems for large-scale text analysis. This April, her work on the WordSeer project won the support of a 2011 NEH Startup Grant.