(from the introductory section of the 2021 syllabus)
Who wrote that anonymous memo? Can we track COVID infections by analyzing Google searches? Do Facebook status updates provide clues about whether U.S. regional dialects are increasingly blending together? Are wars foreshadowed by low-level conflict reported in online media? And how do conscious and unconscious biases show up in newspaper reporting?
Scholars have investigated all of these questions — and many more like them — using computational tools that identify patterns in language and text, taking as inputs the growing volumes of data that are gathered daily from our devices, computers, and smartphones. The combination of these tools and these data has become known as the “big data revolution”, and it is transforming our understanding of the world in which we live.
This course covers text mining and language data analysis in an interdisciplinary manner accessible to non-computer science students in the humanities and social sciences. While a basic familiarity with python programming is a prerequisite, a much more important requirement is a lively curiosity about the answers and insights that can be extracted from big data.
Data science techniques are amazingly powerful at helping us find patterns in large quantities of texts, and their results can illuminate important questions in important and exciting new ways. However, the texts a society produces encapsulate prevailing mores, biases, and norms. As a result, text-as-data analyses are both uniquely positioned to identify such biases and, simultaneously, especially vulnerable to unwittingly reproducing such biases or producing biased outcomes. The likely presence of biases in both our source material and our analyses — and the importance of being aware of these pitfalls — will be an important thread throughout the course.
I most recently offered this course in the Fall of 2021. Syllabus here