text preprocessing

Translation4tru (Translation for the Rest of Us)

Embedding-based word-level translation

Automated machine translation has come a long way, and neural machine translation (as used e.g. by Google Translate) is now very good. However, translating large quantities of texts for research purposes remains resource intensive. I make available bilingual word-level translation dictionaries that exceed in both size and quality any readily available alternatives. Translation4tru includes these dictionaries, along with python notebooks that can be used to apply them to translate text, as well as notebooks and code that can be used to develop translation dictionaries for additional language pairs.

Illustration by M.C. Escher - Tower of Babel (fair use)


High-quality dictionary-based sentiment analysis

Sentiment analysis (the measurement of the positivity or negativity of texts) is one of the most widely used tools in computational text analysis. MultiLexScaled is a sentiment analysis tool that can be applied off-the-shelf, has been proven to work well across a range of domains, and obtains performance comparable to dedicated machine learning applications.