Based on the annotation of the languages per chat, different sub-corpora were created.
The following basic considerations were applied when creating the sub-corpora:
Additionally to these corpora, you also see corpora with lowercase letters in the browser (e.g. deu-rftagged, ita-tagged, roh etc.). These corpora contain data from our SMS project.
Next to these main sub-corpora, there are some smaller sub-corpora:
Additionally to these corpora, you also see corpora with lowercase letters in the browser (e.g. deu-rftagged, ita-tagged, roh etc.). These corpora contain data from our SMS project.
The individual sub-corpora are well documented in terms of size etc. within the browsing tool. Check the according section for more information.