Multilingual comment comparison

In this project we want to explore how comments in different languages on differ

⛶ Full screen

Data Science notebook | Sources

Our pitch at #swihack:

Challenge is the international unit of the Swiss Broadcasting Corporation and provides independent reporting on Switzerland. It reaches about one million users worldwide every month who are allowed to comment on the articles.

Our goal is to compare the comments based on their:

  • language
  • topic
  • number
  • publication date
  • length

According to these criteria, firstly we would be able to answer to the following questions:

  • Do any language communities comment more on specific topics?
  • Are time-trends in multilingual comments on
  • Do any language communities comment more/less positive on specific articles?
  • Are positive/negative comments influenced by translation quality (human translation vs. automatic translation + post-editing)?
  • Who is the most hated/loved author/translator?
  • Are articles in specific languages longer or more detailed?

Then, we could conduct a sentiment analysis based on words used in the articles and finally represent words as real-valued vectors in a predefined vector space (Word Vector Embedding).


We've provided the "articles.json" file in the Github repository, containing ~450 Articles in different languages with language specific comments. Feel free to add your own analysis. We will gather more articles across the hackaton and upload them periodically.

The json file contains a list of articles. Each article has all available different language version in the "content" tag with all comments.

Challenge by Valentina V. Baldassarre, Damian Murezzan, Samuel Pawel and Hubert Zumwald.

This content is a preview from an external site.


Repository for multilingual comments comparison project at swihack hackathon 2020

Event finished

22.02.2020 12:15

Edited content

22.02.2020 09:33 ~ hubihack

Joined the team

21.02.2020 16:47 ~ Damian

Edited content

21.02.2020 14:57 ~ Samuel

Joined the team

21.02.2020 14:13 ~ hubihack

Edited content

21.02.2020 14:07 ~ VVBaldassarre

Joined the team

21.02.2020 11:27 ~ VVBaldassarre

Challenge posted

21.02.2020 11:27 ~ Samuel

Event started

21.02.2020 08:00
Loading ...

Connect to the community on Forum | Telegram | Twitter | Medium

All attendees, sponsors, partners, volunteers and staff at our hackathon are required to agree with the Hack Code of Conduct. Organisers will enforce this code throughout the event. We expect cooperation from all participants to ensure a safe environment for everybody. For more details on how the event is run, see the Guidelines on our wiki.

Creative Commons LicenceThe contents of this website, unless otherwise stated, are licensed under a Creative Commons Attribution 4.0 International License.

Multilingual Media