Supporting
122

Lost in translation

When misunderstandings lead to fake news

Challenge

Often media sites translate quotes and names using automated tools, and mistakes creep in. Sometimes its humans who inadvertedly make translation mistakes. We'd like to find ways to highlight such errors and offer alternative suggestions on media sites. How to highlight translation mistakes and offer suggestions for improvement? Challenge proposed by Patrick Boehler, editor at SWI swissinfo.ch

Result

Presentation: docs.google.com/presentation

Our pitch at #swihack: youtube.com

Live demo: http://translationese-detector.surge.sh/

Code: https://github.com/manuelroth/lost-in-translation

Screenshot

These contents were scraped from an external site. Visit the original location to see all the formatting.

This project was created during the Swissinfo (SWI) Multilingual Hackathon in Bern, Switzerland. The theme of the hackathon was to empower linguistic diversity in newsrooms. The hackathon was held during the International Mother Language Day (Feb. 21-22, 2020).

Description

Often media sites translate quotes and names using automated tools, and mistakes creep in. Sometimes its humans who inadvertedly make translation mistakes. This project aims to find ways to highlight such errors and offer alternative suggestions on media sites.

Key question: How to highlight translation mistakes and offer suggestions for improvement?

Translation mistakes that occure frequently

  • Idioms - Idiom is a phrase or an expression that has a figurative, or sometimes literal, meaning.
  • Quotes
  • Negations
  • Translations of tense
  • Change of meaning and style
  • "Translationese" - Awkwardness or ungrammaticality of translation, such as due to overly literal translation of idioms or syntax.
  • "False Friends" - False friends are words in different languages that look or sound similar, but differ significantly in meaning (English: embarrassed -> Spanish: embarazada (which means pregnant))

Approach

A list of sentences with a "good" and a "bad" translations was collected. Based on that dataset multiple checks where implemented. The result of these checks is visualized in an interface. The interface allows the autors to quickly identify potentially flawed translations and fix them.

Potential checks

  • Wordcount per sentence (implemented)
  • Wordcount per document
  • False friend detection (implemented)
  • Sentiment analysis
  • Negation detection
  • Idiom detection
  • Readability - FLESCH-Index

Potential improvements

  • Currently there are only a few checks implemented. The project could be extended with more checks in the future. See the list above to see what could be implemented next.
  • The source dataset is very small. It should be extended with more data, which would allow to further improve the checks.
  • The current solution doesn't allow to detect bad translations dynamically. It's not possible to edit the tranlations in the interface directly and see if the translation is improved.

Develop

git clone git@github.com:manuelroth/lost-in-translation.git
cd lost-in-translation/client/
npm run dev

Deploy

npm install --global surge
cd lost-in-translation/client/
npm run build
surge public
23.02.2020 13:45 ~ manuelroth

Worked on documentation

22.02.2020 12:15

Hackathon finished

21.02.2020 12:42

Team forming

Susan has joined!

21.02.2020 10:59

Team forming

d_boyle has joined!

21.02.2020 08:00

Hackathon started

20.02.2020 13:46

Team forming

AKohler has joined!

20.02.2020 13:46

Team forming

Sharon has joined!

20.02.2020 13:45

Team forming

manuelroth has joined!

20.02.2020 13:41

Team forming

philipkueng has joined!

07.01.2020 06:07 ~ patrickboehler

Worked on documentation

06.01.2020 15:35

Team forming

Julie Hunt has joined!

13.11.2019 12:10

Team forming

patrickboehler has joined!

13.11.2019 12:10

Project started

Initialized by patrickboehler 🎉

Connect to the community on Forum | Telegram | Twitter | Medium

All attendees, sponsors, partners, volunteers and staff at our hackathon are required to agree with the Hack Code of Conduct. Organisers will enforce this code throughout the event. We expect cooperation from all participants to ensure a safe environment for everybody. For more details on how the event is run, see the Guidelines on our wiki.

Creative Commons LicenceThe contents of this website, unless otherwise stated, are licensed under a Creative Commons Attribution 4.0 International License.