The advantages of lexicon-based sentiment analysis in an age of machine learning

Fine-grained, generalizable, and domain independent

By A. Maurits van der Veen and Erik Bleich
Keywords: sentiment analysis, text-as-data, computational methods, natural language processing

October 19, 2024

Date

October 19, 2024

Abstract

Assessing whether texts are positive or negative—sentiment analysis—has wide-ranging applications across many disciplines. Automated approaches make it possible to code near unlimited quantities of texts rapidly, replicably, and with high accuracy. Compared to machine learning and large language model (LLM) approaches, lexicon-based methods may sacrifice some in performance, but in exchange they provide generalizability and domain independence, while crucially offering the possibility of identifying gradations in sentiment. We demonstrate the strong performance of lexica using MultiLexScaled, an approach which averages valences across a number of widely-used general-purpose lexica. We validate it against benchmark datasets from a range of different domains, comparing performance against machine learning and LLM alternatives. In addition, we illustrate the value of identifying fine-grained sentiment levels by showing, in an analysis of pre- and post-9/11 British press coverage of Muslims, that binarized valence metrics give rise to different (and erroneous) conclusions about the nature of the post-9/11 shock as well as about differences between broadsheet and tabloid coverage. The code to apply MultiLexScaled is available online.

journal website

Posted on:: October 19, 2024

Length:: 0 minute read, 0 words

Categories:: sentiment analysis, text-as-data, computational methods, natural language processing

See Also: