User Tools

Site Tools


01_corpus:02_preprocessing:06_pos

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
01_corpus:02_annotations:06_pos [2020/04/16 16:35] – ↷ Page moved and renamed from 01_corpus:04_annotations:02_pos to 01_corpus:02_annotations:06_pos simone01_corpus:02_preprocessing:06_pos [2020/05/04 13:48] – [Swiss German dialect] simone
Line 1: Line 1:
 ====== 1.2.6 Part of Speech Tagging ====== ====== 1.2.6 Part of Speech Tagging ======
-Some sub-corpora have been annotated with Part Of Speech annotations.+Some sub-corpora have been annotated with Part Of Speech annotations. This concerns WUS_DIALOG_GSW, WUS_FRA,  
 +WUS_FRA_DEMOG, WUS_ITA, WUS_ITA_DEMOG.
  
 ===== French ===== ===== French =====
 The whole French corpus has been annotated with [[https://team.inria.fr/almanach/fr/melt/|MElt]] (Modified French TreeBank) using the tag set [[http://french-postaggers.tiddlyspot.com/|CC Tagset]]. Available annotations are "mftb_pos" (for part of speech) and "mftb_lem" (for the lemma). The following tags are used: The whole French corpus has been annotated with [[https://team.inria.fr/almanach/fr/melt/|MElt]] (Modified French TreeBank) using the tag set [[http://french-postaggers.tiddlyspot.com/|CC Tagset]]. Available annotations are "mftb_pos" (for part of speech) and "mftb_lem" (for the lemma). The following tags are used:
  
-  * ADJ adjective +  * ''ADJ'' adjective 
-  * ADJWH interrogative adjective +  * ''ADJWH'' interrogative adjective 
-  * ADV adverb +  * ''ADV'' adverb 
-  * ADVWH interrogative adverb +  * ''ADVWH'' interrogative adverb 
-  * CC coordination conjunction +  * ''CC'' coordinating conjunction 
-  * CLO object clitic pronoun +  * ''CLO'' object clitic pronoun 
-  * CLR reflexive clitic pronoun +  * ''CLR'' reflexive clitic pronoun 
-  * CLS subject clitic pronoun +  * ''CLS'' subject clitic pronoun 
-  * CS subordination conjunction +  * ''CS'' subordinating conjunction 
-  * DET determiner +  * ''DET'' determiner 
-  * DETWH interrogative determiner +  * ''DETWH'' interrogative determiner 
-  * ET foreign word +  * ''ET'' foreign word 
-  * I interjection +  * ''I'' interjection 
-  * NC common noun +  * ''NC'' common noun 
-  * NPP proper noun +  * ''NPP'' proper noun 
-  * P preposition +  * ''P'' preposition 
-  * P+D preposition+determiner amalgam +  * ''P+D'' preposition+determiner amalgam 
-  * P+PRO prepositon+pronoun amalgam +  * ''P+PRO'' prepositon+pronoun amalgam 
-  * PONCT punctuation mark +  * ''PONCT'' punctuation mark 
-  * PREF prefix +  * ''PREF'' prefix 
-  * PRO full pronoun +  * ''PRO'' full pronoun 
-  * PROREL relative pronoun +  * ''PROREL'' relative pronoun 
-  * PROWH interrogative pronoun +  * ''PROWH'' interrogative pronoun 
-  * V indicative or conditional verb form +  * ''V'' indicative or conditional verb form 
-  * VIMP imperative verb form +  * ''VIMP'' imperative verb form 
-  * VINF infinitive verb form +  * ''VINF'' infinitive verb form 
-  * VPP past participle +  * ''VPP'' past participle 
-  * VPR present participle +  * ''VPR'' present participle 
-  * VS subjunctive verb form+  * ''VS'' subjunctive verb form 
  
-Additionally, the following combined annotations can occur, e.g. “P+D” for a preposition with a determiner like  //aux//. The following list is ordered by the number of occurrences within the corpus: 
-  * CLS+V 
-  * CLS+CLO 
-  * CS+CS 
-  * CLS+CLO+V 
-  * ADV+CLR+V+ADV 
-  * DET+NC 
-  * CLS+CLR 
-  * CLS+CLR+V 
-  * PRO+V 
-  * P+NC 
-  * CLR+V 
-  * CLO+V 
-  * DET+ADJ 
-  * V+CLS 
-  * CS+CLS 
-  * P+PRO 
-  * ADV+V 
-  * DET+DET 
-  * DET+PRO 
-  * CLO+CLO 
-  * P+VINF 
-  * CLS+CLO+P 
-  * P+ADJ 
-  * CLS+VS 
-  * CLS+CLO+CLO 
-  * CLR+VINF 
-  * CLS+NC 
-  * CLS+DET 
-  * PROWH+V+CLS+CS 
-  * ADV+ADV+ADV 
-  * NPP+V 
-  * CLS+CLR+CLO 
-  * DET+VPP 
-  * ADV+ADV+CS 
-  * ET+CLO+V 
-  * ADV+VPP 
-  * ADV+VINF 
-  * CLS+P 
-  * P+VPP 
-  * CLS+VPP 
-  * CLR+NC 
-  * ET+CLO 
  
 ===== Swiss German dialect ===== ===== Swiss German dialect =====
-A small part of the Swiss German dialectal data has been manually normalized and annotated for Part of Speech. The according corpus is called WUS_DIALOG_GSW. Three annotations have been added to each token:+Five chats of the Swiss German dialectal data (34,683 tokens) have been manually normalized and annotated for Part of Speech. The according corpus is called WUS_DIALOG_GSW. Three annotations have been added to each token:
  
   * gloss: The manual normalization   * gloss: The manual normalization
-  * tt_pos: Part of Speech annotation with [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] based on the manually normalized tokens, i.e. "gloss".+  * tt_pos: Part of Speech annotation with [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] based on the manually normalized tokens.
   * tt_lem: The lemma as assigned by TreeTagger   * tt_lem: The lemma as assigned by TreeTagger
  
Line 89: Line 48:
 The [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/stts_guide.pdf|tagset]] uses the following tags: The [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/stts_guide.pdf|tagset]] uses the following tags:
  
-  * ADJA attributive adjective (including participles used adjectivally) //das große Haus die versunkene Glocke// +  * ''ADJA'' attributive adjective (including participles used adjectivally)  
-  * ADJD predicate adjective; adjective used adverbially //der Vogel ist blau er fährt schnell// +  * ''ADJD'' predicate adjective; adjective used adverbially  
-  * ADV adverb (never used as attributive adjective) //sie kommt bald// +  * ''ADV'' adverb (never used as attributive adjective)  
-  * APPR preposition left hand part of double preposition //auf dem Tisch an der Straße entlang// +  * ''APPR'' preposition left hand part of double preposition  
-  * APPRART preposition with fused article //am Tag// +  * ''APPRART'' preposition with fused article  
-  * APPO postposition //meiner Meinung nach// +  * ''APPO'' postposition  
-  * APZR right hand part of double preposition //an der Straße entlang// +  * ''APZR'' right hand part of double preposition  
-  * ART article (definite or indefinite) //die Tante; eine Tante// +  * ''ART'' article (definite or indefinite)  
-  * CARD cardinal number (words or figures); also declined //zwei; 526; dreier// +  * ''CARD'' cardinal number (words or figures); also declined  
-  * FM foreign words (actual part of speech in original language may be appended, e.g. FMADV/ FM-NN) //semper fidem// +  * ''FM'' foreign words (actual part of speech in original language may be appended, e.g. FMADV/ FM-NN)  
-  * ITJ interjection //Ach!// +  * ''ITJ'' interjection  
-  * KON co-ordinating conjunction //oder ich bezahle nicht// +  * ''KON'' co-ordinating conjunction  
-  * KOKOM comparative conjunction or particle //er arbeitet als Straßenfeger, so gut wie du// +  * ''KOKOM'' comparative conjunction or particle  
-  * KOUI preposition used to introduce infinitive clause //um den König zu töten// +  * ''KOUI'' preposition used to introduce infinitive clause  
-  * KOUS subordinating conjunction //weil er sie gesehen hat// +  * ''KOUS'' subordinating conjunction  
-  * NA adjective used as noun //der Gesandte// +  * ''NA'' adjective used as noun  
-  * NE names and other proper nouns //Moskau// +  * ''NE'' names and other proper nouns  
-  * NN noun (but not adjectives used as nouns) //der Abend// +  * ''NN'' noun (but not adjectives used as nouns)  
-  * PAV [PROAV] pronominal adverb //sie spielt damit// +  * ''PAV [PROAV]'' pronominal adverb  
-  * PAVREL pronominal adverb used as relative //die Puppe, damit sie spielt// +  * ''PAVREL'' pronominal adverb used as relative  
-  * PDAT demonstrative determiner //dieser Mann war schlecht// +  * ''PDAT'' demonstrative determiner  
-  * PDS demonstrative pronoun //dieser war schlecht// +  * ''PDS'' demonstrative pronoun  
-  * PIAT indefinite determiner (whether occurring on its own or in conjunction with another determiner) //einige Wochen, viele solche Bemerkungen// +  * ''PIAT'' indefinite determiner (whether occurring on its own or in conjunction with another determiner)  
-  * PIS indefinite pronoun //sie hat viele gesehen// +  * ''PIS'' indefinite pronoun  
-  * PPER personal pronoun //sie liebt mich// +  * ''PPER'' personal pronoun  
-  * PRF reflexive pronoun //ich wasche mich, sie wäscht sich// +  * ''PRF'' reflexive pronoun  
-  * PPOSS possessive pronoun //das ist meins// +  * ''PPOSS'' possessive pronoun  
-  * PPOSAT possessive determiner //mein Buch, das ist der meine/meinige// +  * ''PPOSAT'' possessive determiner  
-  * PRELAT relative depending on a noun //der Mann, dessen Lied ich singe […], welchen Begriff ich nicht verstehe// +  * ''PRELAT'' relative depending on a noun  
-  * PRELS relative pronoun (i.e. forms of der or welcher) //der Herr, der gerade kommt; der Herr, welcher nun kommt// +  * ''PRELS'' relative pronoun (i.e. forms of //der// or //welcher//)  
-  * PTKA particle with adjective or adverb //am besten, zu schnell, aufs herzlichste// +  * ''PTKA'' particle with adjective or adverb  
-  * PTKANT answer particle //ja, nein// +  * ''PTKANT'' answer particle  
-  * PTKNEG negative particle //nicht// +  * ''PTKNEG'' negative particle  
-  * PTKREL indeclinable relative particle //so// +  * ''PTKREL'' indeclinable relative particle  
-  * PTKVZ separable prefix //sie kommt an// +  * ''PTKVZ'' separable prefix  
-  * PTKZU infinitive particle zu +  * ''PTKZU'' infinitive particle //zu// 
-  * PWS interrogative pronoun //wer kommt?// +  * ''PWS'' interrogative pronoun  
-  * PWAT interrogative determiner //welche Farbe?// +  * ''PWAT'' interrogative determiner  
-  * PWAV interrogative adverb //wann kommst du?// +  * ''PWAV'' interrogative adverb  
-  * PWAVREL interrogative adverb used as relative //der Zaun, worüber sie springt// +  * ''PWAVREL'' interrogative adverb used as relative  
-  * PWREL interrogative pronoun used as relative //etwas, was er sieht// +  * ''PWREL'' interrogative pronoun used as relative  
-  * TRUNC truncated form of compound //Vor- und Nachteile// +  * ''TRUNC'' truncated form of compound  
-  * VAFIN finite auxiliary verb //sie ist gekommen// +  * ''VAFIN'' finite auxiliary verb  
-  * VAIMP imperative of auxiliary //sei still!// +  * ''VAIMP'' imperative of auxiliary  
-  * VAINF infinitive of auxiliary //er wird es gesehen haben// +  * ''VAINF'' infinitive of auxiliary  
-  * VAPP past participle of auxiliary //sie ist es gewesen// +  * ''VAPP'' past participle of auxiliary  
-  * VMFIN finite modal verb //sie will kommen// +  * ''VMFIN'' finite modal verb  
-  * VMINF infinitive of modal //er hat es sehen müssen// +  * ''VMINF'' infinitive of modal  
-  * VMPP past participle of auxiliary //sie hat es gekonnt// +  * ''VMPP'' past participle of auxiliary  
-  * VVFIN finite full verb //sie ist gekommen// +  * ''VVFIN'' finite full verb  
-  * VVIMP imperative of full verb //bleibt da!// +  * ''VVIMP'' imperative of full verb  
-  * VVINF infinitive of full verb //er wird es sehen// +  * ''VVINF'' infinitive of full verb  
-  * VVIZU infinitive with incorporated zu //sie versprach aufzuhören// +  * ''VVIZU'' infinitive with incorporated //zu//  
-  * VVPP past participle of full verb //sie ist gekommen//+  * ''VVPP'' past participle of full verb
  
 As in the French corpus, there are also combined tags such as //VAFIN+PPER// when a personal pronoun is agglutinated to a verb (//hätti// for 'hätte ich'). As in the French corpus, there are also combined tags such as //VAFIN+PPER// when a personal pronoun is agglutinated to a verb (//hätti// for 'hätte ich').
Line 148: Line 107:
  
 ===== Italian ===== ===== Italian =====
-The Italian corpus is annotated with the [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]], too, but based on the original tokens, i.e. not manually normalized. In this sub-corpus, however, only some parts were manually normalized resulting in the following three annotations:+The Italian corpus is annotated with the [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]], too, but based on the original tokens, i.e. not manually normalized. 
  
-   * gloss: The manual normalization (often _UNGLOSSED_) 
    * tt_pos: Part of Speech annotation with TreeTagger    * tt_pos: Part of Speech annotation with TreeTagger
    * tt_lem: The lemma as assigned by TreeTagger    * tt_lem: The lemma as assigned by TreeTagger
  
 The following PoS [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|tagset]] was used: The following PoS [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|tagset]] was used:
-  * ABR abbreviation +  * ''ABR'' abbreviation 
-  * ADJ adjective +  * ''ADJ'' adjective 
-  * ADV adverb +  * ''ADV'' adverb 
-  * CON conjunction +  * ''CON'' conjunction 
-  * DET:def definite article +  * ''DET:def'' definite article 
-  * DET:indef indefinite article +  * ''DET:indef'' indefinite article 
-  * FW foreign word +  * ''FW'' foreign word 
-  * INT interjection +  * ''INT'' interjection 
-  * LS list symbol +  * ''LS'' list symbol 
-  * NOM noun +  * ''NOM'' noun 
-  * NPR name +  * ''NPR'' name 
-  * NUM numeral +  * ''NUM'' numeral 
-  * PON punctuation +  * ''PON'' punctuation 
-  * PRE preposition +  * ''PRE'' preposition 
-  * PRE:det preposition+article +  * ''PRE:det'' preposition+article 
-  * PRO pronoun +  * ''PRO'' pronoun 
-  * PRO:demo demonstrative pronoun +  * ''PRO:demo'' demonstrative pronoun 
-  * PRO:indef indefinite pronoun +  * ''PRO:indef'' indefinite pronoun 
-  * PRO:inter interrogative pronoun +  * ''PRO:inter'' interrogative pronoun 
-  * PRO:pers personal pronoun +  * ''PRO:pers'' personal pronoun 
-  * PRO:poss possessive pronoun +  * ''PRO:poss'' possessive pronoun 
-  * PRO:refl reflexive pronoun +  * ''PRO:refl'' reflexive pronoun 
-  * PRO:rela relative pronoun +  * ''PRO:rela'' relative pronoun 
-  * SENT sentence marker +  * ''SENT'' sentence marker 
-  * SYM symbol +  * ''SYM'' symbol 
-  * VER:cimp verb conjunctive imperfect +  * ''VER:cimp'' verb conjunctive imperfect 
-  * VER:cond verb conditional +  * ''VER:cond'' verb conditional 
-  * VER:cpre verb conjunctive present +  * ''VER:cpre'' verb conjunctive present 
-  * VER:futu verb future tense +  * ''VER:futu'' verb future tense 
-  * VER:geru verb gerund +  * ''VER:geru'' verb gerund 
-  * VER:impe verb imperative +  * ''VER:impe'' verb imperative 
-  * VER:impf verb imperfect +  * ''VER:impf'' verb imperfect 
-  * VER:infi verb infinitive +  * ''VER:infi'' verb infinitive 
-  * VER:pper verb participle perfect +  * ''VER:pper'' verb participle perfect 
-  * VER:ppre verb participle present +  * ''VER:ppre'' verb participle present 
-  * VER:pres verb present +  * ''VER:pres'' verb present 
-  * VER:refl:infi verb reflexive infinitive +  * ''VER:refl:infi'' verb reflexive infinitive 
-  * VER:remo verb simple past+  * ''VER:remo'' verb simple past
  
01_corpus/02_preprocessing/06_pos.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki