01_corpus:02_preprocessing:06_pos
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
Previous revisionNext revision | |||
— | 01_corpus:02_preprocessing:06_pos [2020/04/17 11:11] – simone | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== 1.2.6 Part of Speech Tagging ====== | ||
+ | Some sub-corpora have been annotated with Part Of Speech annotations. This concerns WUS_DIALOG_GSW, | ||
+ | WUS_FRA_DEMOG, | ||
+ | |||
+ | ===== French ===== | ||
+ | The whole French corpus has been annotated with [[https:// | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | |||
+ | |||
+ | ===== Swiss German dialect ===== | ||
+ | Five chats of the Swiss German dialectal data (34,683 tokens) have been manually normalized and annotated for Part of Speech. The according corpus is called WUS_DIALOG_GSW. Three annotations have been added to each token: | ||
+ | |||
+ | * gloss: The manual normalization | ||
+ | * tt_pos: Part of Speech annotation with [[https:// | ||
+ | * tt_lem: The lemma as assigned by TreeTagger | ||
+ | |||
+ | |||
+ | The [[https:// | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | As in the French corpus, there are also combined tags such as // | ||
+ | |||
+ | |||
+ | ===== Italian ===== | ||
+ | The Italian corpus is annotated with the [[https:// | ||
+ | |||
+ | * tt_pos: Part of Speech annotation with TreeTagger | ||
+ | * tt_lem: The lemma as assigned by TreeTagger | ||
+ | |||
+ | The following PoS [[https:// | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
01_corpus/02_preprocessing/06_pos.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1