Računalniška analiza čustev in tem v Wikiviru
Synopsis
This article presents a computational analysis of themes and emotions in the corpus of Slovenian literature, freely available on Wikivir (i.e., the Slovenian Wikisource). It presents the approach used to build the corpus, which is also made freely available to other researchers, as well as the method for analyzing emotions and predominant topics in large databases. The research highlights the main thematic emphases in different time periods, genres, and authors, as well as the predominant emotions, using both a lexical approach and sentiment analysis. Despite the invaluable wealth of the digitized Slovenian literature on Wikivir, this source has been underutilized for more extensive research based on natural language processing due to the complexity of the MediaWiki tagger, the dispersion of transferred texts, and the complexity of text conversion and processing. For the purposes of this research, a corpus of freely accessible Slovenian literature was compiled, comprising just over 62 million words collected from 22,919 texts, which are (inconsistently) annotated on Wikivir with over two thousand categories or metadata (author, year, century, genre, etc.). These data are also being systematized using automatized approaches, thus presenting a more accurate distribution of texts published on Wikivir.
Downloads
Pages
Published
Series
Categories
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.