Table of Contents
1.2.6 Part of Speech Tagging
Some sub-corpora have been annotated with Part Of Speech annotations. This concerns WUS_DIALOG_GSW, WUS_FRA, WUS_FRA_DEMOG, WUS_ITA, WUS_ITA_DEMOG.
French
The whole French corpus has been annotated with MElt (Modified French TreeBank) using the tag set CC Tagset. Available annotations are "mftb_pos" (for part of speech) and "mftb_lem" (for the lemma). The following tags are used:
ADJadjectiveADJWHinterrogative adjectiveADVadverbADVWHinterrogative adverbCCcoordinating conjunctionCLOobject clitic pronounCLRreflexive clitic pronounCLSsubject clitic pronounCSsubordinating conjunctionDETdeterminerDETWHinterrogative determinerETforeign wordIinterjectionNCcommon nounNPPproper nounPprepositionP+Dpreposition+determiner amalgamP+PROprepositon+pronoun amalgamPONCTpunctuation markPREFprefixPROfull pronounPRORELrelative pronounPROWHinterrogative pronounVindicative or conditional verb formVIMPimperative verb formVINFinfinitive verb formVPPpast participleVPRpresent participleVSsubjunctive verb form
Swiss German dialect
Five chats of the Swiss German dialectal data (34,683 tokens) have been manually normalized and annotated for Part of Speech. The according corpus is called WUS_DIALOG_GSW. Three annotations have been added to each token:
- gloss: The manual normalization
- tt_pos: Part of Speech annotation with TreeTagger based on the manually normalized tokens.
- tt_lem: The lemma as assigned by TreeTagger
The tagset uses the following tags:
ADJAattributive adjective (including participles used adjectivally)ADJDpredicate adjective; adjective used adverbiallyADVadverb (never used as attributive adjective)APPRpreposition left hand part of double prepositionAPPRARTpreposition with fused articleAPPOpostpositionAPZRright hand part of double prepositionARTarticle (definite or indefinite)CARDcardinal number (words or figures); also declinedFMforeign words (actual part of speech in original language may be appended, e.g. FMADV/ FM-NN)ITJinterjectionKONco-ordinating conjunctionKOKOMcomparative conjunction or particleKOUIpreposition used to introduce infinitive clauseKOUSsubordinating conjunctionNAadjective used as nounNEnames and other proper nounsNNnoun (but not adjectives used as nouns)PAV [PROAV]pronominal adverbPAVRELpronominal adverb used as relativePDATdemonstrative determinerPDSdemonstrative pronounPIATindefinite determiner (whether occurring on its own or in conjunction with another determiner)PISindefinite pronounPPERpersonal pronounPRFreflexive pronounPPOSSpossessive pronounPPOSATpossessive determinerPRELATrelative depending on a nounPRELSrelative pronoun (i.e. forms of der or welcher)PTKAparticle with adjective or adverbPTKANTanswer particlePTKNEGnegative particlePTKRELindeclinable relative particlePTKVZseparable prefixPTKZUinfinitive particle zuPWSinterrogative pronounPWATinterrogative determinerPWAVinterrogative adverbPWAVRELinterrogative adverb used as relativePWRELinterrogative pronoun used as relativeTRUNCtruncated form of compoundVAFINfinite auxiliary verbVAIMPimperative of auxiliaryVAINFinfinitive of auxiliaryVAPPpast participle of auxiliaryVMFINfinite modal verbVMINFinfinitive of modalVMPPpast participle of auxiliaryVVFINfinite full verbVVIMPimperative of full verbVVINFinfinitive of full verbVVIZUinfinitive with incorporated zuVVPPpast participle of full verb
As in the French corpus, there are also combined tags such as VAFIN+PPER when a personal pronoun is agglutinated to a verb (hätti for 'hätte ich').
Italian
The Italian corpus is annotated with the TreeTagger, too, but based on the original tokens, i.e. not manually normalized.
- tt_pos: Part of Speech annotation with TreeTagger
- tt_lem: The lemma as assigned by TreeTagger
The following PoS tagset was used:
ABRabbreviationADJadjectiveADVadverbCONconjunctionDET:defdefinite articleDET:indefindefinite articleFWforeign wordINTinterjectionLSlist symbolNOMnounNPRnameNUMnumeralPONpunctuationPREprepositionPRE:detpreposition+articlePROpronounPRO:demodemonstrative pronounPRO:indefindefinite pronounPRO:interinterrogative pronounPRO:perspersonal pronounPRO:posspossessive pronounPRO:reflreflexive pronounPRO:relarelative pronounSENTsentence markerSYMsymbolVER:cimpverb conjunctive imperfectVER:condverb conditionalVER:cpreverb conjunctive presentVER:futuverb future tenseVER:geruverb gerundVER:impeverb imperativeVER:impfverb imperfectVER:infiverb infinitiveVER:pperverb participle perfectVER:ppreverb participle presentVER:presverb presentVER:refl:infiverb reflexive infinitiveVER:removerb simple past
