Multilingual comment comparison
In this project we want to explore how comments in different languages on swissinfo.ch differ
⛶ Full screen
Our pitch at #swihack: youtube.com
Swissinfo.ch is the international unit of the Swiss Broadcasting Corporation and provides independent reporting on Switzerland. It reaches about one million users worldwide every month who are allowed to comment on the articles.
Our goal is to compare the comments based on their:
- publication date
According to these criteria, firstly we would be able to answer to the following questions:
- Do any language communities comment more on specific topics?
- Are time-trends in multilingual comments on swissinfo.ch?
- Do any language communities comment more/less positive on specific articles?
- Are positive/negative comments influenced by translation quality (human translation vs. automatic translation + post-editing)?
- Who is the most hated/loved author/translator?
- Are articles in specific languages longer or more detailed?
Then, we could conduct a sentiment analysis based on words used in the articles and finally represent words as real-valued vectors in a predefined vector space (Word Vector Embedding).
IF YOU WANT TO PARTICIPATE:
We've provided the "articles.json" file in the Github repository, containing ~450 Articles in different languages with language specific comments. Feel free to add your own analysis. We will gather more articles across the hackaton and upload them periodically.
The json file contains a list of articles. Each article has all available different language version in the "content" tag with all comments.
Challenge by Valentina V. Baldassarre, Damian Murezzan, Samuel Pawel and Hubert Zumwald.
Repository for multilingual comments comparison project at swihack hackathon 2020