In the last decade the library community and other providers of digital collections have created an incredibly rich digital archive of historical and cultural materials. Yet most scholars have not yet figured out ways to take full advantage of the digitized riches suddenly available on their computers.
Indeed, the abundance of digital documents has actually exacerbated the problems of some researchers who now find themselves overwhelmed by the sheer quantity of available material. Meanwhile, some of the most profound insights lurking in these digital corpora remain locked up.
We believe the absence of appropriate methods and interfaces is largely to blame: digital content providers have not yet developed the kind of sophisticated and flexible search, extraction, and analysis tools capable of capitalizing on this vast investment in a digitized cultural heritage.
These more sophisticated methods and interfaces require what computer scientists call “text mining and analysis,” which involves both the micro-analysis of texts (picking apart a chosen text by its constituent words) and macro-analysis of texts (using software and algorithms to reveal patterns and relationships within and between documents). This sort of computational analysis makes possible advances to the current research process in history and the humanities.