01_corpus:start
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
01_corpus:start [2020/04/17 20:08] – ↷ Links adapted because of a move operation simone | 01_corpus:start [2022/06/27 09:21] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== 1. THE CORPUS ====== | ====== 1. THE CORPUS ====== | ||
- | The corpus consists of 617 chats that were sent in by the Swiss population in 2014 through a fixed procedure that was communicated in the press in order to get people interested. The individual chats were checked for their [[01_corpus: | + | The corpus consists of 617 chats that were sent in by the Swiss population in 2014 through a fixed procedure that was communicated in the press in order to get people interested. The individual chats were checked for their [[01_corpus: |
- | Next processing steps comprised [[01_corpus: | + | Next processing steps comprised [[01_corpus: |
+ | Our authentic WhatsApp chats were gathered in summer 2014. Not all made it into the corpus (e.g. doublets, chats or message without permission etc.). In its present form, the corpus comprises: | ||
+ | |||
+ | * Number of chats: 617 | ||
+ | * Number of messages (with permission to be used): 763’644 | ||
+ | * Number of informants (who gave their permission): | ||
+ | * Number of tokens: 5' | ||
+ | * Number of emojis: 382' | ||
+ | |||
+ | The corpus is built up of chats in all four national languages of Switzerland, | ||
+ | |||
+ | Available languages: | ||
+ | * fra: French | ||
+ | * ita: Italian | ||
+ | * roh: Any variety of Romansh | ||
+ | * gsw: dialectal German as used in Switzerland | ||
+ | * deu: non-dialectal German | ||
+ | * eng: English | ||
+ | * spa: Spanish | ||
+ | * sla: Any Slavic language | ||
+ | |||
+ | Romansh varieties: | ||
+ | |||
+ | * roh-ja: Jauer Romansh | ||
+ | * roh-sr: romontsch sursilvan | ||
+ | * roh-st: rumàntsch sutsilvan | ||
+ | * roh-sm: rumantsch surmiran | ||
+ | * roh-pt: rumauntsch puter | ||
+ | * roh-vl: rumantsch vallader | ||
+ | * roh-gr: rumantsch grischun | ||
+ | |||
+ | The tool used to browse is [[https:// | ||
+ | |||
+ | Krause, Thomas & Zeldes, Amir (2016): ANNIS3: A new architecture for generic corpus query and visualization. in: Digital Scholarship in the Humanities 2016 (31). [[http:// | ||
01_corpus/start.1587146912.txt.gz · Last modified: 2022/06/27 09:21 (external edit)