01_corpus:start
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| 01_corpus:start [2020/04/14 11:26] – ↷ Links adapted because of a move operation simone | 01_corpus:start [2025/09/16 12:02] (current) – Gabrielle Aguila-Multner | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== 1. THE CORPUS ====== | ====== 1. THE CORPUS ====== | ||
| + | The corpus consists of 617 chats that were sent in by the Swiss population in 2014 through a fixed procedure that was communicated in the press in order to get people interested. The individual chats were checked for their [[01_corpus: | ||
| - | The corpus consists of 617 chats that were sent in by the Swiss population in 2014 through | + | Next processing steps comprised [[01_corpus: |
| - | In a first step the most basic processing | + | Our authentic WhatsApp chats were gathered in summer 2014. Not all made it into the corpus (e.g. doublets, chats or message without permission etc.). |
| + | |||
| + | * Number | ||
| + | * Number of messages (with permission to be used): 763’644 | ||
| + | * Number of informants (who gave their permission): | ||
| + | * Number of tokens: 5' | ||
| + | * Number of emojis: 382' | ||
| + | |||
| + | The corpus is built up of chats in all four national languages of Switzerland, | ||
| + | |||
| + | Available languages: | ||
| + | * fra: French | ||
| + | * ita: Italian | ||
| + | * roh: Any variety | ||
| + | * gsw: dialectal German as used in Switzerland | ||
| + | * deu: non-dialectal German | ||
| + | * eng: English | ||
| + | * spa: Spanish | ||
| + | * sla: Any Slavic language | ||
| + | |||
| + | Romansh varieties: | ||
| + | |||
| + | * roh-ja: Jauer Romansh | ||
| + | * roh-sr: romontsch sursilvan | ||
| + | * roh-st: rumàntsch sutsilvan | ||
| + | * roh-sm: rumantsch surmiran | ||
| + | * roh-pt: rumauntsch puter | ||
| + | * roh-vl: rumantsch vallader | ||
| + | * roh-gr: rumantsch grischun | ||
| + | |||
| + | The main way to browse the corpus is through the [[https:// | ||
| + | |||
| + | Krause, Thomas & Zeldes, Amir (2016): ANNIS3: A new architecture for generic corpus query and visualization. in: Digital Scholarship in the Humanities 2016 (31). [[http:// | ||
| - | In a later step, more [[01_corpus: | ||
01_corpus/start.1586863586.txt.gz · Last modified: (external edit)
