Launching
84

Multilingual comment comparison

In this project we want to explore how comments in different languages on swissinfo.ch differ


⛶ Full screen

Data Science notebook | Sources

Our pitch at #swihack: youtube.com

Challenge

Swissinfo.ch is the international unit of the Swiss Broadcasting Corporation and provides independent reporting on Switzerland. It reaches about one million users worldwide every month who are allowed to comment on the articles.

Our goal is to compare the comments based on their:

  • language
  • topic
  • number
  • publication date
  • length

According to these criteria, firstly we would be able to answer to the following questions:

  • Do any language communities comment more on specific topics?
  • Are time-trends in multilingual comments on swissinfo.ch?
  • Do any language communities comment more/less positive on specific articles?
  • Are positive/negative comments influenced by translation quality (human translation vs. automatic translation + post-editing)?
  • Who is the most hated/loved author/translator?
  • Are articles in specific languages longer or more detailed?

Then, we could conduct a sentiment analysis based on words used in the articles and finally represent words as real-valued vectors in a predefined vector space (Word Vector Embedding).

IF YOU WANT TO PARTICIPATE:

We've provided the "articles.json" file in the Github repository, containing ~450 Articles in different languages with language specific comments. Feel free to add your own analysis. We will gather more articles across the hackaton and upload them periodically.

The json file contains a list of articles. Each article has all available different language version in the "content" tag with all comments.

Challenge by Valentina V. Baldassarre, Damian Murezzan, Samuel Pawel and Hubert Zumwald.

swihack-comments

Repository for multilingual comments comparison project at swihack hackathon 2020

Updated 21:40 22.02.2020
Maintained by Samuel

  • 09:33 22.02.2020 / hubihack / update
  • 09:31 22.02.2020 / hubihack / update
  • 09:24 22.02.2020 / hubihack / update
  • 09:17 22.02.2020 / Samuel / update
  • 09:16 22.02.2020 / Samuel / update

Connect to the community on Forum | Telegram | Twitter | Medium

All attendees, sponsors, partners, volunteers and staff at our hackathon are required to agree with the Hack Code of Conduct. Organisers will enforce this code throughout the event. We expect cooperation from all participants to ensure a safe environment for everybody. For more details on how the event is run, see the Guidelines on our wiki.

Creative Commons LicenceThe contents of this website, unless otherwise stated, are licensed under a Creative Commons Attribution 4.0 International License.