Lost in translation
When misunderstandings lead to fake news
Often media sites translate quotes and names using automated tools, and mistakes creep in. Sometimes its humans who inadvertedly make translation mistakes. We'd like to find ways to highlight such errors and offer alternative suggestions on media sites. How to highlight translation mistakes and offer suggestions for improvement? Challenge proposed by Patrick Boehler, editor at SWI swissinfo.ch
Our pitch at #swihack: youtube.com
Live demo: http://translationese-detector.surge.sh/
This project was created during the Swissinfo (SWI) Multilingual Hackathon in Bern, Switzerland. The theme of the hackathon was to empower linguistic diversity in newsrooms. The hackathon was held during the International Mother Language Day (Feb. 21-22, 2020).
Often media sites translate quotes and names using automated tools, and mistakes creep in. Sometimes its humans who inadvertedly make translation mistakes. This project aims to find ways to highlight such errors and offer alternative suggestions on media sites.
Key question: How to highlight translation mistakes and offer suggestions for improvement?
Translation mistakes that occure frequently
- Idioms - Idiom is a phrase or an expression that has a figurative, or sometimes literal, meaning.
- Translations of tense
- Change of meaning and style
- "Translationese" - Awkwardness or ungrammaticality of translation, such as due to overly literal translation of idioms or syntax.
- "False Friends" - False friends are words in different languages that look or sound similar, but differ significantly in meaning (English: embarrassed -> Spanish: embarazada (which means pregnant))
A list of sentences with a "good" and a "bad" translations was collected. Based on that dataset multiple checks where implemented. The result of these checks is visualized in an interface. The interface allows the autors to quickly identify potentially flawed translations and fix them.
- Wordcount per sentence (implemented)
- Wordcount per document
- False friend detection (implemented)
- Sentiment analysis
- Negation detection
- Idiom detection
- Readability - FLESCH-Index
- Currently there are only a few checks implemented. The project could be extended with more checks in the future. See the list above to see what could be implemented next.
- The source dataset is very small. It should be extended with more data, which would allow to further improve the checks.
- The current solution doesn't allow to detect bad translations dynamically. It's not possible to edit the tranlations in the interface directly and see if the translation is improved.
git clone firstname.lastname@example.org:manuelroth/lost-in-translation.git cd lost-in-translation/client/ npm run dev
npm install --global surge cd lost-in-translation/client/ npm run build surge public