Translation4tru (Translation for the Rest of Us)

Embedding-based word-level translation

Automated machine translation has come a long way, and neural machine translation (as used e.g. by Google Translate) is now very good. However, translating large quantities of texts for research purposes remains resource intensive. I make available bilingual word-level translation dictionaries that exceed in both size and quality any readily available alternatives. Translation4tru includes these dictionaries, along with python notebooks that can be used to apply them to translate text, as well as notebooks and code that can be used to develop translation dictionaries for additional language pairs.

Illustration by M.C. Escher - Tower of Babel (fair use)

MultiLexScaled

High-quality dictionary-based sentiment analysis

Sentiment analysis (the measurement of the positivity or negativity of texts) is one of the most widely used tools in computational text analysis. MultiLexScaled is a sentiment analysis tool that can be applied off-the-shelf, has been proven to work well across a range of domains, and obtains performance comparable to dedicated machine learning applications.

Dynamic maps using pandas

Dynamic choropleth maps

Dynamic charts and maps are great for displaying patterns over time. A key challenge with such charts and maps is to keep the mapping of values to colours constant over time. Most charting programs will automatically assign colours depending on the range and number of different values available. This is problematic when both of those factors vary over time, even though we want the colours representing a particular value or level to remain constant. I have added a feature to Jack McKew’s excellent pandas-alive dynamic charting program to make this possible for choropleth maps, and provide a sample notebook applying this feature to an over-time mapping of Covid case- and death rates (using the New York Times data).