01_corpus:02_preprocessing:06_pos
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
01_corpus:02_preprocessing:06_pos [2020/04/16 17:45] – [1.2.6 Part of Speech Tagging] simone | 01_corpus:02_preprocessing:06_pos [2022/06/27 09:21] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 6: | Line 6: | ||
The whole French corpus has been annotated with [[https:// | The whole French corpus has been annotated with [[https:// | ||
- | * ADJ adjective | + | * '' |
- | * ADJWH interrogative adjective | + | * '' |
- | * ADV adverb | + | * '' |
- | * ADVWH interrogative adverb | + | * '' |
- | * CC coordination | + | * '' |
- | * CLO object clitic pronoun | + | * '' |
- | * CLR reflexive clitic pronoun | + | * '' |
- | * CLS subject clitic pronoun | + | * '' |
- | * CS subordination | + | * '' |
- | * DET determiner | + | * '' |
- | * DETWH interrogative determiner | + | * '' |
- | * ET foreign word | + | * '' |
- | * I interjection | + | * '' |
- | * NC common noun | + | * '' |
- | * NPP proper noun | + | * '' |
- | * P preposition | + | * '' |
- | * P+D preposition+determiner amalgam | + | * '' |
- | * P+PRO prepositon+pronoun amalgam | + | * '' |
- | * PONCT punctuation mark | + | * '' |
- | * PREF prefix | + | * '' |
- | * PRO full pronoun | + | * '' |
- | * PROREL relative pronoun | + | * '' |
- | * PROWH interrogative pronoun | + | * '' |
- | * V indicative or conditional verb form | + | * '' |
- | * VIMP imperative verb form | + | * '' |
- | * VINF infinitive verb form | + | * '' |
- | * VPP past participle | + | * '' |
- | * VPR present participle | + | * '' |
- | * VS subjunctive verb form | + | * '' |
- | Additionally, | ||
- | * CLS+V | ||
- | * CLS+CLO | ||
- | * CS+CS | ||
- | * CLS+CLO+V | ||
- | * ADV+CLR+V+ADV | ||
- | * DET+NC | ||
- | * CLS+CLR | ||
- | * CLS+CLR+V | ||
- | * PRO+V | ||
- | * P+NC | ||
- | * CLR+V | ||
- | * CLO+V | ||
- | * DET+ADJ | ||
- | * V+CLS | ||
- | * CS+CLS | ||
- | * P+PRO | ||
- | * ADV+V | ||
- | * DET+DET | ||
- | * DET+PRO | ||
- | * CLO+CLO | ||
- | * P+VINF | ||
- | * CLS+CLO+P | ||
- | * P+ADJ | ||
- | * CLS+VS | ||
- | * CLS+CLO+CLO | ||
- | * CLR+VINF | ||
- | * CLS+NC | ||
- | * CLS+DET | ||
- | * PROWH+V+CLS+CS | ||
- | * ADV+ADV+ADV | ||
- | * NPP+V | ||
- | * CLS+CLR+CLO | ||
- | * DET+VPP | ||
- | * ADV+ADV+CS | ||
- | * ET+CLO+V | ||
- | * ADV+VPP | ||
- | * ADV+VINF | ||
- | * CLS+P | ||
- | * P+VPP | ||
- | * CLS+VPP | ||
- | * CLR+NC | ||
- | * ET+CLO | ||
===== Swiss German dialect ===== | ===== Swiss German dialect ===== | ||
- | A small part of the Swiss German dialectal data has been manually normalized and annotated for Part of Speech. The according corpus is called WUS_DIALOG_GSW. Three annotations have been added to each token: | + | Five chats of the Swiss German dialectal data (34,683 tokens) have been manually normalized and annotated for Part of Speech. The according corpus is called WUS_DIALOG_GSW. Three annotations have been added to each token: |
* gloss: The manual normalization | * gloss: The manual normalization | ||
- | * tt_pos: Part of Speech annotation with [[https:// | + | * tt_pos: Part of Speech annotation with [[https:// |
* tt_lem: The lemma as assigned by TreeTagger | * tt_lem: The lemma as assigned by TreeTagger | ||
Line 90: | Line 48: | ||
The [[https:// | The [[https:// | ||
- | * ADJA attributive adjective (including participles used adjectivally) //das große Haus die versunkene Glocke// | + | * '' |
- | * ADJD predicate adjective; adjective used adverbially //der Vogel ist blau er fährt schnell// | + | * '' |
- | * ADV adverb (never used as attributive adjective) //sie kommt bald// | + | * '' |
- | * APPR preposition left hand part of double preposition //auf dem Tisch an der Straße entlang// | + | * '' |
- | * APPRART preposition with fused article //am Tag// | + | * '' |
- | * APPO postposition //meiner Meinung nach// | + | * '' |
- | * APZR right hand part of double preposition //an der Straße entlang// | + | * '' |
- | * ART article (definite or indefinite) //die Tante; eine Tante// | + | * '' |
- | * CARD cardinal number (words or figures); also declined //zwei; 526; dreier// | + | * '' |
- | * FM foreign words (actual part of speech in original language may be appended, e.g. FMADV/ FM-NN) //semper fidem// | + | * '' |
- | * ITJ interjection //Ach!// | + | * '' |
- | * KON co-ordinating conjunction //oder ich bezahle nicht// | + | * '' |
- | * KOKOM comparative conjunction or particle //er arbeitet als Straßenfeger, | + | * '' |
- | * KOUI preposition used to introduce infinitive clause //um den König zu töten// | + | * '' |
- | * KOUS subordinating conjunction //weil er sie gesehen hat// | + | * '' |
- | * NA adjective used as noun //der Gesandte// | + | * '' |
- | * NE names and other proper nouns //Moskau// | + | * '' |
- | * NN noun (but not adjectives used as nouns) //der Abend// | + | * '' |
- | * PAV [PROAV] pronominal adverb //sie spielt damit// | + | * '' |
- | * PAVREL pronominal adverb used as relative //die Puppe, damit sie spielt// | + | * '' |
- | * PDAT demonstrative determiner //dieser Mann war schlecht// | + | * '' |
- | * PDS demonstrative pronoun //dieser war schlecht// | + | * '' |
- | * PIAT indefinite determiner (whether occurring on its own or in conjunction with another determiner) //einige Wochen, viele solche Bemerkungen// | + | * '' |
- | * PIS indefinite pronoun //sie hat viele gesehen// | + | * '' |
- | * PPER personal pronoun //sie liebt mich// | + | * '' |
- | * PRF reflexive pronoun //ich wasche mich, sie wäscht sich// | + | * '' |
- | * PPOSS possessive pronoun //das ist meins// | + | * '' |
- | * PPOSAT possessive determiner //mein Buch, das ist der meine/ | + | * '' |
- | * PRELAT relative depending on a noun //der Mann, dessen Lied ich singe […], welchen Begriff ich nicht verstehe// | + | * '' |
- | * PRELS relative pronoun (i.e. forms of der or welcher) //der Herr, der gerade kommt; der Herr, welcher | + | * '' |
- | * PTKA particle with adjective or adverb //am besten, zu schnell, aufs herzlichste// | + | * '' |
- | * PTKANT answer particle //ja, nein// | + | * '' |
- | * PTKNEG negative particle //nicht// | + | * '' |
- | * PTKREL indeclinable relative particle //so// | + | * '' |
- | * PTKVZ separable prefix //sie kommt an// | + | * '' |
- | * PTKZU infinitive particle zu | + | * '' |
- | * PWS interrogative pronoun //wer kommt?// | + | * '' |
- | * PWAT interrogative determiner //welche Farbe?// | + | * '' |
- | * PWAV interrogative adverb //wann kommst du?// | + | * '' |
- | * PWAVREL interrogative adverb used as relative //der Zaun, worüber sie springt// | + | * '' |
- | * PWREL interrogative pronoun used as relative //etwas, was er sieht// | + | * '' |
- | * TRUNC truncated form of compound //Vor- und Nachteile// | + | * '' |
- | * VAFIN finite auxiliary verb //sie ist gekommen// | + | * '' |
- | * VAIMP imperative of auxiliary //sei still!// | + | * '' |
- | * VAINF infinitive of auxiliary //er wird es gesehen haben// | + | * '' |
- | * VAPP past participle of auxiliary //sie ist es gewesen// | + | * '' |
- | * VMFIN finite modal verb //sie will kommen// | + | * '' |
- | * VMINF infinitive of modal //er hat es sehen müssen// | + | * '' |
- | * VMPP past participle of auxiliary //sie hat es gekonnt// | + | * '' |
- | * VVFIN finite full verb //sie ist gekommen// | + | * '' |
- | * VVIMP imperative of full verb //bleibt da!// | + | * '' |
- | * VVINF infinitive of full verb //er wird es sehen// | + | * '' |
- | * VVIZU infinitive with incorporated | + | * '' |
- | * VVPP past participle of full verb //sie ist gekommen// | + | * '' |
As in the French corpus, there are also combined tags such as // | As in the French corpus, there are also combined tags such as // | ||
Line 149: | Line 107: | ||
===== Italian ===== | ===== Italian ===== | ||
- | The Italian corpus is annotated with the [[https:// | + | The Italian corpus is annotated with the [[https:// |
- | * gloss: The manual normalization (often _UNGLOSSED_) | ||
* tt_pos: Part of Speech annotation with TreeTagger | * tt_pos: Part of Speech annotation with TreeTagger | ||
* tt_lem: The lemma as assigned by TreeTagger | * tt_lem: The lemma as assigned by TreeTagger | ||
The following PoS [[https:// | The following PoS [[https:// | ||
- | * ABR abbreviation | + | * '' |
- | * ADJ adjective | + | * '' |
- | * ADV adverb | + | * '' |
- | * CON conjunction | + | * '' |
- | * DET: | + | * '' |
- | * DET: | + | * '' |
- | * FW foreign word | + | * '' |
- | * INT interjection | + | * '' |
- | * LS list symbol | + | * '' |
- | * NOM noun | + | * '' |
- | * NPR name | + | * '' |
- | * NUM numeral | + | * '' |
- | * PON punctuation | + | * '' |
- | * PRE preposition | + | * '' |
- | * PRE: | + | * '' |
- | * PRO pronoun | + | * '' |
- | * PRO: | + | * '' |
- | * PRO: | + | * '' |
- | * PRO: | + | * '' |
- | * PRO: | + | * '' |
- | * PRO: | + | * '' |
- | * PRO: | + | * '' |
- | * PRO: | + | * '' |
- | * SENT sentence marker | + | * '' |
- | * SYM symbol | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
- | * VER: | + | * '' |
01_corpus/02_preprocessing/06_pos.1587051932.txt.gz · Last modified: 2022/06/27 09:21 (external edit)