start
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
start [2020/04/14 15:05] – simone | start [2022/06/27 09:21] – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
===== The project ===== | ===== The project ===== | ||
- | The linguistic | + | The data underlying the corpus was collected in 2014 to constitute the data base of the research project " |
+ | ===== Using the corpus ===== | ||
+ | [[https:// | ||
===== The corpus ===== | ===== The corpus ===== | ||
Line 10: | Line 12: | ||
* Number of chats: 617 | * Number of chats: 617 | ||
* Number of messages (with permission to be used): 763’644 | * Number of messages (with permission to be used): 763’644 | ||
+ | * Number of informants (who gave their permission): | ||
* Number of tokens: 5' | * Number of tokens: 5' | ||
* Number of emojis: 382' | * Number of emojis: 382' | ||
+ | |||
+ | The corpus is built up of chats in all four national languages of Switzerland, | ||
+ | |||
+ | Available languages: | ||
+ | * fra: French | ||
+ | * ita: Italian | ||
+ | * roh: any variety of Romansh | ||
+ | * gsw: dialectal German as used in Switzerland | ||
+ | * deu: non-dialectal German | ||
+ | * eng: English | ||
+ | * spa: Spanish | ||
+ | * sla: any Slavic language | ||
+ | |||
+ | Romansh varieties: | ||
+ | |||
+ | * roh-ja: Jauer Romansh | ||
+ | * roh-sr: Romontsch Sursilvan | ||
+ | * roh-st: Rumàntsch Sutsilvan | ||
+ | * roh-sm: Rumantsch Surmiran | ||
+ | * roh-pt: Rumauntsch Puter | ||
+ | * roh-vl: Rumantsch Vallader | ||
+ | * roh-gr: Rumantsch Grischun | ||
+ | |||
+ | |||
More information about the corpus can be found in the section [[01_corpus: | More information about the corpus can be found in the section [[01_corpus: | ||
Line 17: | Line 44: | ||
[[https:// | [[https:// | ||
- | ===== Using the corpus ===== | + | |
- | This corpus is freely available for academic, non-commercial research. When using the corpus, please make sure to quote correctly. | + | |
Line 28: | Line 54: | ||
==== This documentation ==== | ==== This documentation ==== | ||
- | Ueberwasser, | + | Stark, Elisabeth; |
- | ==== Overview and statistics | + | ==== Creation of the corpus |
- | For an overview over languages in the corpus consult: | + | Ueberwasser, |
==== The project ==== | ==== The project ==== | ||
- | Stark, Elisabeth (2016-2019). SNSF project | + | Stark, Elisabeth (2016-2020). //SNSF project |
+ | ===== Raw data ===== | ||
+ | If you want to use our raw data for computational linguistic projects, please contact [[estark@rom.uzh.ch|Prof. Elisabeth Stark]] to see whether your project complies with our requirements. If we make the data available, a CC BY-NC-ND license is applied. | ||
start.txt · Last modified: 2022/09/12 19:19 by Stefan Bircher