Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES: gradnja, vsebina, uporaba

Nataša Logar Berginc; Miha Grčar; Marko Brakus; Tomaž Erjavec; Špela Arhar Holdt; Simon Krek

doi:10.4312/9789610603542

Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES: gradnja, vsebina, uporaba

Authors

Nataša Logar Berginc, University of Ljubljana, Faculty of Social Sciences, Slovenia; Miha Grčar; Marko Brakus; Tomaž Erjavec, Jožef Stefan Institute, Ljubljana, Slovenia; Špela Arhar Holdt, University of Ljubljana, Faculty of Computer and Information Science, Slovenia; Simon Krek, Jožef Stefan Institute, Ljubljana, Slovenia

DOI: https://doi.org/10.4312/9789610603542

Keywords:

reference corpora, text reception, text production, internet texts, language technologies

Synopsis

One of the aims of the Communication in Slovene project (2008-2013) was the compilation of a reference corpus of written Slovene. The outcome was the Gigafida korpus, containing over 1 billion words, which is an upgrade of two earlier corpora of Slovene: the FIDA corpus (2000) and the FidaPLUS corpus (2006).

All the collected texts were put in the Gigafida corpus (in addition to the texts from the FIDA corpus and the FidaPLUS corpus), however a more balanced distribution of genres has been planned and realized in a 100-million-word corpus called KRES. In addition, we built two subcorpora that are available under Creative Commons licence (“Attribution-NonCommercial-ShareAlike”): the first subcorpus (ccGigafida) contains 9% of Gigafida, the second one (ccKRES) 9% of KRES.

Downloads

Download data is not yet available.

Downloads

PDF

Published

August 28, 2020

Series

Sporazumevanje

How to Cite

Logar Berginc, N., Grčar, M., Brakus, M., Erjavec, T., Arhar Holdt, Špela, & Krek, S. (2020). Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES: gradnja, vsebina, uporaba. University of Ljubljana Press. https://doi.org/10.4312/9789610603542

Download Citation

Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES: gradnja, vsebina, uporaba

Authors

Keywords:

Synopsis

Downloads

Downloads

Published

Series

Categories

How to Cite

Language

Information