The advantages of lexicon-based sentiment analysis in an age of machine learning
Fine-grained, generalizable, and domain independent
By A. Maurits van der Veen and Erik Bleich
Keywords: sentiment analysis, text-as-data, computational methods, natural language processing
October 19, 2024
Date
October 19, 2024
Abstract
Assessing whether texts are positive or negative—sentiment analysis—has wide-ranging applications across many disciplines. Automated approaches make it possible to code near unlimited quantities of texts rapidly, replicably, and with high accuracy. Compared to machine learning and large language model (LLM) approaches, lexicon-based methods may sacrifice some in performance, but in exchange they provide generalizability and domain independence, while crucially offering the possibility of identifying gradations in sentiment. We demonstrate the strong performance of lexica using MultiLexScaled, an approach which averages valences across a number of widely-used general-purpose lexica. We validate it against benchmark datasets from a range of different domains, comparing performance against machine learning and LLM alternatives. In addition, we illustrate the value of identifying fine-grained sentiment levels by showing, in an analysis of pre- and post-9/11 British press coverage of Muslims, that binarized valence metrics give rise to different (and erroneous) conclusions about the nature of the post-9/11 shock as well as about differences between broadsheet and tabloid coverage. The code to apply MultiLexScaled is available online.
- Posted on:
- October 19, 2024
- Length:
- 0 minute read, 0 words
- See Also: