SENTIDA: A New Tool for Sentiment Analysis in Danish

Abstract

In the midst of the Era of Big Data, tools for analysing and processing unstructured data are needed more than ever. Being among these, sentiment analysis has experienced both a substantial proliferation in popularity and major developmental progress. However, the development of sentiment analysis tools in Danish has not experienced the same rapid development as e.g. English tools. Few Danish tools exist, and often the ones available are either ineffective or outdated. Moreover, authoritative validation tests in low-resource languages, are missing, which is why little can be deduced about the competence of current Danish models. We present SENTIDA, a simple and effective model for general sentiment analysis in Danish, and compare its competence to the current benchmark within the field of Danish sentiment analysis, AFINN. Combining a lexical approach with several incorporated functions, we construct SENTIDA and categorise it as a domain-independent sentiment analysis tool focusing on polarity strength. Subsequently, we run different validation tests, including a binary classification test of Trustpilot reviews and a correlation test based on manually rated texts from different domains. The results show that SENTIDA excels across all tests, predicting reviews with an accuracy above 80% in all trials and providing significant correlations with manually annotated texts.

Publication
Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 4(1)
Jacob Aarup Dalsgaard
Jacob Aarup Dalsgaard
Phd Student Algorithmic Fairness

Related