01_corpus:02_preprocessing:04_languages
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| 01_corpus:02_preprocessing:04_languages [2020/04/16 14:42] – simone | 01_corpus:02_preprocessing:04_languages [2022/06/27 07:21] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| ===== Languages and varieties per chat ===== | ===== Languages and varieties per chat ===== | ||
| - | In order to assign a language tagging to each chat, we looked the first 250 messages and assigned two possible attributes per language: | + | In order to assign a language tagging to each chat, we looked |
| * lang_100_and_more: | * lang_100_and_more: | ||
| Line 22: | Line 22: | ||
| For an overview over languages and varieties in the corpus consult: | For an overview over languages and varieties in the corpus consult: | ||
| - | Ueberwasser, | + | Ueberwasser, |
| - | ===== 1.3.5 Languages and varieties per message ===== | + | ===== Languages and varieties per message ===== |
| - | The information of the main language of a message is saved in the annotation most_likely_lang and can thus be queried with e.g. '' | + | The information of the main language of a message is saved in the annotation |
| Available languages: | Available languages: | ||
01_corpus/02_preprocessing/04_languages.1587048142.txt.gz · Last modified: (external edit)
