A Guide to Frequency Lists from the Gigafida 2.0 and GOS 1.0 Corpora

Jaka Čibej; Špela Arhar Holdt; Kaja Dobrovoljc; Simon Krek

doi:10.4312/9789610604006

A Guide to Frequency Lists from the Gigafida 2.0 and GOS 1.0 Corpora

Authors

Jaka Čibej

University of Ljubljana, Faculty of Computer and Information Science, Slovenia

Špela Arhar Holdt

University of Ljubljana, Faculty of Computer and Information Science, Slovenia

Kaja Dobrovoljc

University of Ljubljana, Faculty of Arts, Slovenia

Simon Krek

Jožef Stefan Institute, Ljubljana, Slovenia

DOI: https://doi.org/10.4312/9789610604006

Keywords:

written Slovene, spoken Slovene, LIST program, CLARIN.SI repository, language corpora

Synopsis

The research project titled “The New Grammar of Modern Standard Slovene: Resources and Methods” was carried out by the researchers of the Jožef Stefan Institute, the Faculty of Arts, and the Faculty of Computer and Information Science of the University of Ljubljana. The goal of the project was to define a linguistic methodological basis for a computational analysis of written and spoken Slovene as present in modern Slovene language corpora. Based on these new methods, a series of open-access corpus-based databases were generated, which can serve as a basis for the preparation of an empirical grammatical description of modern Slovene, as well as the development of language technologies for Slovene.

The purpose of this publication is to provide a quick overview of the data made available at the CLARIN.SI repository, and to demonstrate the functions and uses of the LIST program, which can be used on other corpora for extracting similar frequency lists. The guide features short excerpts of all available frequency lists, i.e. the table header and approximately 30 lines. Each table also features the link to the data in the repository. Each subsection of a chapter begins with a short description of the conditions used in the extraction. The guide is available in Slovene and English.

Downloads

Download data is not yet available.

Downloads

PDF

Published

December 30, 2020

Series

Sporazumevanje

How to Cite

Čibej, J., Arhar Holdt, Špela, Dobrovoljc, K., & Krek, S. (2020). A Guide to Frequency Lists from the Gigafida 2.0 and GOS 1.0 Corpora. University of Ljubljana Press. https://doi.org/10.4312/9789610604006

Download Citation

A Guide to Frequency Lists from the Gigafida 2.0 and GOS 1.0 Corpora

Authors

Keywords:

Synopsis

Downloads

Downloads

Published

Series

Categories

How to Cite

Language

Information