Supporting
122

Lost in translation

When misunderstandings lead to fake news


Challenge

Often media sites translate quotes and names using automated tools, and mistakes creep in. Sometimes its humans who inadvertedly make translation mistakes. We'd like to find ways to highlight such errors and offer alternative suggestions on media sites. How to highlight translation mistakes and offer suggestions for improvement? Challenge proposed by Patrick Boehler, editor at SWI swissinfo.ch

Result

Presentation: docs.google.com/presentation

Our pitch at #swihack: youtube.com

Live demo: http://translationese-detector.surge.sh/

Code: https://github.com/manuelroth/lost-in-translation

Screenshot

This project was created during the Swissinfo (SWI) Multilingual Hackathon in Bern, Switzerland. The theme of the hackathon was to empower linguistic diversity in newsrooms. The hackathon was held during the International Mother Language Day (Feb. 21-22, 2020).

Description

Often media sites translate quotes and names using automated tools, and mistakes creep in. Sometimes its humans who inadvertedly make translation mistakes. This project aims to find ways to highlight such errors and offer alternative suggestions on media sites.

Key question: How to highlight translation mistakes and offer suggestions for improvement?

Translation mistakes that occure frequently

  • Idioms - Idiom is a phrase or an expression that has a figurative, or sometimes literal, meaning.
  • Quotes
  • Negations
  • Translations of tense
  • Change of meaning and style
  • "Translationese" - Awkwardness or ungrammaticality of translation, such as due to overly literal translation of idioms or syntax.
  • "False Friends" - False friends are words in different languages that look or sound similar, but differ significantly in meaning (English: embarrassed -> Spanish: embarazada (which means pregnant))

Approach

A list of sentences with a "good" and a "bad" translations was collected. Based on that dataset multiple checks where implemented. The result of these checks is visualized in an interface. The interface allows the autors to quickly identify potentially flawed translations and fix them.

Potential checks

  • Wordcount per sentence (implemented)
  • Wordcount per document
  • False friend detection (implemented)
  • Sentiment analysis
  • Negation detection
  • Idiom detection
  • Readability - FLESCH-Index

Potential improvements

  • Currently there are only a few checks implemented. The project could be extended with more checks in the future. See the list above to see what could be implemented next.
  • The source dataset is very small. It should be extended with more data, which would allow to further improve the checks.
  • The current solution doesn't allow to detect bad translations dynamically. It's not possible to edit the tranlations in the interface directly and see if the translation is improved.

Develop

git clone git@github.com:manuelroth/lost-in-translation.git
cd lost-in-translation/client/
npm run dev

Deploy

npm install --global surge
cd lost-in-translation/client/
npm run build
surge public

Updated 13:45 23.02.2020
Maintained by patrickboehler

  • 13:45 23.02.2020 / manuelroth / update
  • 13:05 22.02.2020 / Sharon / update
  • 13:04 22.02.2020 / Sharon / update
  • 13:03 22.02.2020 / Sharon / update
  • 13:03 22.02.2020 / Sharon / update

Connect to the community on Forum | Telegram | Twitter | Medium

All attendees, sponsors, partners, volunteers and staff at our hackathon are required to agree with the Hack Code of Conduct. Organisers will enforce this code throughout the event. We expect cooperation from all participants to ensure a safe environment for everybody. For more details on how the event is run, see the Guidelines on our wiki.

Creative Commons LicenceThe contents of this website, unless otherwise stated, are licensed under a Creative Commons Attribution 4.0 International License.