A Guide to Frequency Lists from the Gigafida 2.0 and GOS 1.0 Corpora

Jaka Čibej; Špela Arhar Holdt; Kaja Dobrovoljc; Simon Krek

doi:10.4312/9789610604006

A Guide to Frequency Lists from the Gigafida 2.0 and GOS 1.0 Corpora

Avtorji

Jaka Čibej

University of Ljubljana, Faculty of Computer and Information Science, Slovenia

Špela Arhar Holdt

University of Ljubljana, Faculty of Computer and Information Science, Slovenia

Kaja Dobrovoljc

University of Ljubljana, Faculty of Arts, Slovenia

Simon Krek

Jožef Stefan Institute, Ljubljana, Slovenia

DOI: https://doi.org/10.4312/9789610604006

Ključne besede:

written Slovene, spoken Slovene, LIST program, CLARIN.SI repository, language corpora

Kratka vsebina

The research project titled “The New Grammar of Modern Standard Slovene: Resources and Methods” was carried out by the researchers of the Jožef Stefan Institute, the Faculty of Arts, and the Faculty of Computer and Information Science of the University of Ljubljana. The goal of the project was to define a linguistic methodological basis for a computational analysis of written and spoken Slovene as present in modern Slovene language corpora. Based on these new methods, a series of open-access corpus-based databases were generated, which can serve as a basis for the preparation of an empirical grammatical description of modern Slovene, as well as the development of language technologies for Slovene.

The purpose of this publication is to provide a quick overview of the data made available at the CLARIN.SI repository, and to demonstrate the functions and uses of the LIST program, which can be used on other corpora for extracting similar frequency lists. The guide features short excerpts of all available frequency lists, i.e. the table header and approximately 30 lines. Each table also features the link to the data in the repository. Each subsection of a chapter begins with a short description of the conditions used in the extraction. The guide is available in Slovene and English.

Downloads

Prenosi

PDF

Najavljeno

30 December 2020

Zbirka

Sporazumevanje

Kategorije

Kako citirati

Čibej, J., Arhar Holdt, Špela, Dobrovoljc, K., & Krek, S. (2020). A Guide to Frequency Lists from the Gigafida 2.0 and GOS 1.0 Corpora. Založba Univerze v Ljubljani. https://doi.org/10.4312/9789610604006

Prenesi citat

A Guide to Frequency Lists from the Gigafida 2.0 and GOS 1.0 Corpora

Avtorji

Ključne besede:

Kratka vsebina

Downloads

Prenosi

Najavljeno

Zbirka

Kategorije

Kako citirati

Jezik

Informacije