Credits/Sources

Birgit Tautz and Quyen Ha compiled this corpus by querying the University of Bielefeld's Digital Collections of the Journals of German Enlightenment.

Tesseract was used to performed Optical Character Recognition (OCR) to convert the original images to searchable/mineable text files. Tesseract is an open-sourced OCR engine, originally developed at Hewlett-Packard Laboratories Bristol but now being developed and managed by the Github community. Tesseract was trained for this specific corpus by Quyen Ha.

Text mining processes such as topic modeling, word frequency analysis, correlation analysis, and key-word-in-context analysis were done using R codes. These codes were written by Crystal Hall, adapted from codes provided by Matt Jockers in "Text Analysis with R for Students of Literature" (Spring, 2014), and modified by Quyen Ha.

Interactive topic modeling visualization was created with LDAvis. LDAvis is a set of tools to create an interactive web-based visualization of a topic model that has been fit to a corpus of text data using Latent Dirichlet Allocation (LDA). Given the estimated parameters of the topic model, it computes various summary statistics as input to an interactive visualization built with D3.js that is accessed via a browser. LDAvis is created by Carson Sievert and Kenneth E. Shirley.

LDAvis created its visualization using the D3 Library (version 3.0). D3.js is a JavaScript library for producing dynamic, interactive data visualizations in web browsers. It was developed by Mike Bostock, Jason Davies, Jeffrey Heer, Vadim Ogievetsky, and others.

This website's theme was created using Bootstrap. Bootstrap is a free and open-source front-end library for designing websites and web applications. It contains HTML- and CSS-based design templates for typography, forms, buttons, navigation and other interface components, as well as optional JavaScript extensions.