User Tools

Site Tools


01_corpus:02_preprocessing:06_pos

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
01_corpus:02_preprocessing:06_pos [2020/04/16 17:45] – [1.2.6 Part of Speech Tagging] simone01_corpus:02_preprocessing:06_pos [2022/06/27 09:21] (current) – external edit 127.0.0.1
Line 6: Line 6:
 The whole French corpus has been annotated with [[https://team.inria.fr/almanach/fr/melt/|MElt]] (Modified French TreeBank) using the tag set [[http://french-postaggers.tiddlyspot.com/|CC Tagset]]. Available annotations are "mftb_pos" (for part of speech) and "mftb_lem" (for the lemma). The following tags are used: The whole French corpus has been annotated with [[https://team.inria.fr/almanach/fr/melt/|MElt]] (Modified French TreeBank) using the tag set [[http://french-postaggers.tiddlyspot.com/|CC Tagset]]. Available annotations are "mftb_pos" (for part of speech) and "mftb_lem" (for the lemma). The following tags are used:
  
-  * ADJ adjective +  * ''ADJ'' adjective 
-  * ADJWH interrogative adjective +  * ''ADJWH'' interrogative adjective 
-  * ADV adverb +  * ''ADV'' adverb 
-  * ADVWH interrogative adverb +  * ''ADVWH'' interrogative adverb 
-  * CC coordination conjunction +  * ''CC'' coordinating conjunction 
-  * CLO object clitic pronoun +  * ''CLO'' object clitic pronoun 
-  * CLR reflexive clitic pronoun +  * ''CLR'' reflexive clitic pronoun 
-  * CLS subject clitic pronoun +  * ''CLS'' subject clitic pronoun 
-  * CS subordination conjunction +  * ''CS'' subordinating conjunction 
-  * DET determiner +  * ''DET'' determiner 
-  * DETWH interrogative determiner +  * ''DETWH'' interrogative determiner 
-  * ET foreign word +  * ''ET'' foreign word 
-  * I interjection +  * ''I'' interjection 
-  * NC common noun +  * ''NC'' common noun 
-  * NPP proper noun +  * ''NPP'' proper noun 
-  * P preposition +  * ''P'' preposition 
-  * P+D preposition+determiner amalgam +  * ''P+D'' preposition+determiner amalgam 
-  * P+PRO prepositon+pronoun amalgam +  * ''P+PRO'' prepositon+pronoun amalgam 
-  * PONCT punctuation mark +  * ''PONCT'' punctuation mark 
-  * PREF prefix +  * ''PREF'' prefix 
-  * PRO full pronoun +  * ''PRO'' full pronoun 
-  * PROREL relative pronoun +  * ''PROREL'' relative pronoun 
-  * PROWH interrogative pronoun +  * ''PROWH'' interrogative pronoun 
-  * V indicative or conditional verb form +  * ''V'' indicative or conditional verb form 
-  * VIMP imperative verb form +  * ''VIMP'' imperative verb form 
-  * VINF infinitive verb form +  * ''VINF'' infinitive verb form 
-  * VPP past participle +  * ''VPP'' past participle 
-  * VPR present participle +  * ''VPR'' present participle 
-  * VS subjunctive verb form+  * ''VS'' subjunctive verb form 
  
-Additionally, the following combined annotations can occur, e.g. “P+D” for a preposition with a determiner like  //aux//. The following list is ordered by the number of occurrences within the corpus: 
-  * CLS+V 
-  * CLS+CLO 
-  * CS+CS 
-  * CLS+CLO+V 
-  * ADV+CLR+V+ADV 
-  * DET+NC 
-  * CLS+CLR 
-  * CLS+CLR+V 
-  * PRO+V 
-  * P+NC 
-  * CLR+V 
-  * CLO+V 
-  * DET+ADJ 
-  * V+CLS 
-  * CS+CLS 
-  * P+PRO 
-  * ADV+V 
-  * DET+DET 
-  * DET+PRO 
-  * CLO+CLO 
-  * P+VINF 
-  * CLS+CLO+P 
-  * P+ADJ 
-  * CLS+VS 
-  * CLS+CLO+CLO 
-  * CLR+VINF 
-  * CLS+NC 
-  * CLS+DET 
-  * PROWH+V+CLS+CS 
-  * ADV+ADV+ADV 
-  * NPP+V 
-  * CLS+CLR+CLO 
-  * DET+VPP 
-  * ADV+ADV+CS 
-  * ET+CLO+V 
-  * ADV+VPP 
-  * ADV+VINF 
-  * CLS+P 
-  * P+VPP 
-  * CLS+VPP 
-  * CLR+NC 
-  * ET+CLO 
  
 ===== Swiss German dialect ===== ===== Swiss German dialect =====
-A small part of the Swiss German dialectal data has been manually normalized and annotated for Part of Speech. The according corpus is called WUS_DIALOG_GSW. Three annotations have been added to each token:+Five chats of the Swiss German dialectal data (34,683 tokens) have been manually normalized and annotated for Part of Speech. The according corpus is called WUS_DIALOG_GSW. Three annotations have been added to each token:
  
   * gloss: The manual normalization   * gloss: The manual normalization
-  * tt_pos: Part of Speech annotation with [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] based on the manually normalized tokens, i.e. "gloss".+  * tt_pos: Part of Speech annotation with [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] based on the manually normalized tokens.
   * tt_lem: The lemma as assigned by TreeTagger   * tt_lem: The lemma as assigned by TreeTagger
  
Line 90: Line 48:
 The [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/stts_guide.pdf|tagset]] uses the following tags: The [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/stts_guide.pdf|tagset]] uses the following tags:
  
-  * ADJA attributive adjective (including participles used adjectivally) //das große Haus die versunkene Glocke// +  * ''ADJA'' attributive adjective (including participles used adjectivally)  
-  * ADJD predicate adjective; adjective used adverbially //der Vogel ist blau er fährt schnell// +  * ''ADJD'' predicate adjective; adjective used adverbially  
-  * ADV adverb (never used as attributive adjective) //sie kommt bald// +  * ''ADV'' adverb (never used as attributive adjective)  
-  * APPR preposition left hand part of double preposition //auf dem Tisch an der Straße entlang// +  * ''APPR'' preposition left hand part of double preposition  
-  * APPRART preposition with fused article //am Tag// +  * ''APPRART'' preposition with fused article  
-  * APPO postposition //meiner Meinung nach// +  * ''APPO'' postposition  
-  * APZR right hand part of double preposition //an der Straße entlang// +  * ''APZR'' right hand part of double preposition  
-  * ART article (definite or indefinite) //die Tante; eine Tante// +  * ''ART'' article (definite or indefinite)  
-  * CARD cardinal number (words or figures); also declined //zwei; 526; dreier// +  * ''CARD'' cardinal number (words or figures); also declined  
-  * FM foreign words (actual part of speech in original language may be appended, e.g. FMADV/ FM-NN) //semper fidem// +  * ''FM'' foreign words (actual part of speech in original language may be appended, e.g. FMADV/ FM-NN)  
-  * ITJ interjection //Ach!// +  * ''ITJ'' interjection  
-  * KON co-ordinating conjunction //oder ich bezahle nicht// +  * ''KON'' co-ordinating conjunction  
-  * KOKOM comparative conjunction or particle //er arbeitet als Straßenfeger, so gut wie du// +  * ''KOKOM'' comparative conjunction or particle  
-  * KOUI preposition used to introduce infinitive clause //um den König zu töten// +  * ''KOUI'' preposition used to introduce infinitive clause  
-  * KOUS subordinating conjunction //weil er sie gesehen hat// +  * ''KOUS'' subordinating conjunction  
-  * NA adjective used as noun //der Gesandte// +  * ''NA'' adjective used as noun  
-  * NE names and other proper nouns //Moskau// +  * ''NE'' names and other proper nouns  
-  * NN noun (but not adjectives used as nouns) //der Abend// +  * ''NN'' noun (but not adjectives used as nouns)  
-  * PAV [PROAV] pronominal adverb //sie spielt damit// +  * ''PAV [PROAV]'' pronominal adverb  
-  * PAVREL pronominal adverb used as relative //die Puppe, damit sie spielt// +  * ''PAVREL'' pronominal adverb used as relative  
-  * PDAT demonstrative determiner //dieser Mann war schlecht// +  * ''PDAT'' demonstrative determiner  
-  * PDS demonstrative pronoun //dieser war schlecht// +  * ''PDS'' demonstrative pronoun  
-  * PIAT indefinite determiner (whether occurring on its own or in conjunction with another determiner) //einige Wochen, viele solche Bemerkungen// +  * ''PIAT'' indefinite determiner (whether occurring on its own or in conjunction with another determiner)  
-  * PIS indefinite pronoun //sie hat viele gesehen// +  * ''PIS'' indefinite pronoun  
-  * PPER personal pronoun //sie liebt mich// +  * ''PPER'' personal pronoun  
-  * PRF reflexive pronoun //ich wasche mich, sie wäscht sich// +  * ''PRF'' reflexive pronoun  
-  * PPOSS possessive pronoun //das ist meins// +  * ''PPOSS'' possessive pronoun  
-  * PPOSAT possessive determiner //mein Buch, das ist der meine/meinige// +  * ''PPOSAT'' possessive determiner  
-  * PRELAT relative depending on a noun //der Mann, dessen Lied ich singe […], welchen Begriff ich nicht verstehe// +  * ''PRELAT'' relative depending on a noun  
-  * PRELS relative pronoun (i.e. forms of der or welcher) //der Herr, der gerade kommt; der Herr, welcher nun kommt// +  * ''PRELS'' relative pronoun (i.e. forms of //der// or //welcher//)  
-  * PTKA particle with adjective or adverb //am besten, zu schnell, aufs herzlichste// +  * ''PTKA'' particle with adjective or adverb  
-  * PTKANT answer particle //ja, nein// +  * ''PTKANT'' answer particle  
-  * PTKNEG negative particle //nicht// +  * ''PTKNEG'' negative particle  
-  * PTKREL indeclinable relative particle //so// +  * ''PTKREL'' indeclinable relative particle  
-  * PTKVZ separable prefix //sie kommt an// +  * ''PTKVZ'' separable prefix  
-  * PTKZU infinitive particle zu +  * ''PTKZU'' infinitive particle //zu// 
-  * PWS interrogative pronoun //wer kommt?// +  * ''PWS'' interrogative pronoun  
-  * PWAT interrogative determiner //welche Farbe?// +  * ''PWAT'' interrogative determiner  
-  * PWAV interrogative adverb //wann kommst du?// +  * ''PWAV'' interrogative adverb  
-  * PWAVREL interrogative adverb used as relative //der Zaun, worüber sie springt// +  * ''PWAVREL'' interrogative adverb used as relative  
-  * PWREL interrogative pronoun used as relative //etwas, was er sieht// +  * ''PWREL'' interrogative pronoun used as relative  
-  * TRUNC truncated form of compound //Vor- und Nachteile// +  * ''TRUNC'' truncated form of compound  
-  * VAFIN finite auxiliary verb //sie ist gekommen// +  * ''VAFIN'' finite auxiliary verb  
-  * VAIMP imperative of auxiliary //sei still!// +  * ''VAIMP'' imperative of auxiliary  
-  * VAINF infinitive of auxiliary //er wird es gesehen haben// +  * ''VAINF'' infinitive of auxiliary  
-  * VAPP past participle of auxiliary //sie ist es gewesen// +  * ''VAPP'' past participle of auxiliary  
-  * VMFIN finite modal verb //sie will kommen// +  * ''VMFIN'' finite modal verb  
-  * VMINF infinitive of modal //er hat es sehen müssen// +  * ''VMINF'' infinitive of modal  
-  * VMPP past participle of auxiliary //sie hat es gekonnt// +  * ''VMPP'' past participle of auxiliary  
-  * VVFIN finite full verb //sie ist gekommen// +  * ''VVFIN'' finite full verb  
-  * VVIMP imperative of full verb //bleibt da!// +  * ''VVIMP'' imperative of full verb  
-  * VVINF infinitive of full verb //er wird es sehen// +  * ''VVINF'' infinitive of full verb  
-  * VVIZU infinitive with incorporated zu //sie versprach aufzuhören// +  * ''VVIZU'' infinitive with incorporated //zu//  
-  * VVPP past participle of full verb //sie ist gekommen//+  * ''VVPP'' past participle of full verb
  
 As in the French corpus, there are also combined tags such as //VAFIN+PPER// when a personal pronoun is agglutinated to a verb (//hätti// for 'hätte ich'). As in the French corpus, there are also combined tags such as //VAFIN+PPER// when a personal pronoun is agglutinated to a verb (//hätti// for 'hätte ich').
Line 149: Line 107:
  
 ===== Italian ===== ===== Italian =====
-The Italian corpus is annotated with the [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]], too, but based on the original tokens, i.e. not manually normalized. In this sub-corpus, however, only some parts were manually normalized resulting in the following three annotations:+The Italian corpus is annotated with the [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]], too, but based on the original tokens, i.e. not manually normalized. 
  
-   * gloss: The manual normalization (often _UNGLOSSED_) 
    * tt_pos: Part of Speech annotation with TreeTagger    * tt_pos: Part of Speech annotation with TreeTagger
    * tt_lem: The lemma as assigned by TreeTagger    * tt_lem: The lemma as assigned by TreeTagger
  
 The following PoS [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|tagset]] was used: The following PoS [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|tagset]] was used:
-  * ABR abbreviation +  * ''ABR'' abbreviation 
-  * ADJ adjective +  * ''ADJ'' adjective 
-  * ADV adverb +  * ''ADV'' adverb 
-  * CON conjunction +  * ''CON'' conjunction 
-  * DET:def definite article +  * ''DET:def'' definite article 
-  * DET:indef indefinite article +  * ''DET:indef'' indefinite article 
-  * FW foreign word +  * ''FW'' foreign word 
-  * INT interjection +  * ''INT'' interjection 
-  * LS list symbol +  * ''LS'' list symbol 
-  * NOM noun +  * ''NOM'' noun 
-  * NPR name +  * ''NPR'' name 
-  * NUM numeral +  * ''NUM'' numeral 
-  * PON punctuation +  * ''PON'' punctuation 
-  * PRE preposition +  * ''PRE'' preposition 
-  * PRE:det preposition+article +  * ''PRE:det'' preposition+article 
-  * PRO pronoun +  * ''PRO'' pronoun 
-  * PRO:demo demonstrative pronoun +  * ''PRO:demo'' demonstrative pronoun 
-  * PRO:indef indefinite pronoun +  * ''PRO:indef'' indefinite pronoun 
-  * PRO:inter interrogative pronoun +  * ''PRO:inter'' interrogative pronoun 
-  * PRO:pers personal pronoun +  * ''PRO:pers'' personal pronoun 
-  * PRO:poss possessive pronoun +  * ''PRO:poss'' possessive pronoun 
-  * PRO:refl reflexive pronoun +  * ''PRO:refl'' reflexive pronoun 
-  * PRO:rela relative pronoun +  * ''PRO:rela'' relative pronoun 
-  * SENT sentence marker +  * ''SENT'' sentence marker 
-  * SYM symbol +  * ''SYM'' symbol 
-  * VER:cimp verb conjunctive imperfect +  * ''VER:cimp'' verb conjunctive imperfect 
-  * VER:cond verb conditional +  * ''VER:cond'' verb conditional 
-  * VER:cpre verb conjunctive present +  * ''VER:cpre'' verb conjunctive present 
-  * VER:futu verb future tense +  * ''VER:futu'' verb future tense 
-  * VER:geru verb gerund +  * ''VER:geru'' verb gerund 
-  * VER:impe verb imperative +  * ''VER:impe'' verb imperative 
-  * VER:impf verb imperfect +  * ''VER:impf'' verb imperfect 
-  * VER:infi verb infinitive +  * ''VER:infi'' verb infinitive 
-  * VER:pper verb participle perfect +  * ''VER:pper'' verb participle perfect 
-  * VER:ppre verb participle present +  * ''VER:ppre'' verb participle present 
-  * VER:pres verb present +  * ''VER:pres'' verb present 
-  * VER:refl:infi verb reflexive infinitive +  * ''VER:refl:infi'' verb reflexive infinitive 
-  * VER:remo verb simple past+  * ''VER:remo'' verb simple past
  
01_corpus/02_preprocessing/06_pos.1587051932.txt.gz · Last modified: 2022/06/27 09:21 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki