Table of Contents
1.2.6 Part of Speech Tagging
Some sub-corpora have been annotated with Part Of Speech annotations. This concerns WUS_DIALOG_GSW, WUS_FRA, WUS_FRA_DEMOG, WUS_ITA, WUS_ITA_DEMOG.
French
The whole French corpus has been annotated with MElt (Modified French TreeBank) using the tag set CC Tagset. Available annotations are "mftb_pos" (for part of speech) and "mftb_lem" (for the lemma). The following tags are used:
ADJ
adjectiveADJWH
interrogative adjectiveADV
adverbADVWH
interrogative adverbCC
coordinating conjunctionCLO
object clitic pronounCLR
reflexive clitic pronounCLS
subject clitic pronounCS
subordinating conjunctionDET
determinerDETWH
interrogative determinerET
foreign wordI
interjectionNC
common nounNPP
proper nounP
prepositionP+D
preposition+determiner amalgamP+PRO
prepositon+pronoun amalgamPONCT
punctuation markPREF
prefixPRO
full pronounPROREL
relative pronounPROWH
interrogative pronounV
indicative or conditional verb formVIMP
imperative verb formVINF
infinitive verb formVPP
past participleVPR
present participleVS
subjunctive verb form
Swiss German dialect
Five chats of the Swiss German dialectal data (34,683 tokens) have been manually normalized and annotated for Part of Speech. The according corpus is called WUS_DIALOG_GSW. Three annotations have been added to each token:
- gloss: The manual normalization
- tt_pos: Part of Speech annotation with TreeTagger based on the manually normalized tokens.
- tt_lem: The lemma as assigned by TreeTagger
The tagset uses the following tags:
ADJA
attributive adjective (including participles used adjectivally)ADJD
predicate adjective; adjective used adverbiallyADV
adverb (never used as attributive adjective)APPR
preposition left hand part of double prepositionAPPRART
preposition with fused articleAPPO
postpositionAPZR
right hand part of double prepositionART
article (definite or indefinite)CARD
cardinal number (words or figures); also declinedFM
foreign words (actual part of speech in original language may be appended, e.g. FMADV/ FM-NN)ITJ
interjectionKON
co-ordinating conjunctionKOKOM
comparative conjunction or particleKOUI
preposition used to introduce infinitive clauseKOUS
subordinating conjunctionNA
adjective used as nounNE
names and other proper nounsNN
noun (but not adjectives used as nouns)PAV [PROAV]
pronominal adverbPAVREL
pronominal adverb used as relativePDAT
demonstrative determinerPDS
demonstrative pronounPIAT
indefinite determiner (whether occurring on its own or in conjunction with another determiner)PIS
indefinite pronounPPER
personal pronounPRF
reflexive pronounPPOSS
possessive pronounPPOSAT
possessive determinerPRELAT
relative depending on a nounPRELS
relative pronoun (i.e. forms of der or welcher)PTKA
particle with adjective or adverbPTKANT
answer particlePTKNEG
negative particlePTKREL
indeclinable relative particlePTKVZ
separable prefixPTKZU
infinitive particle zuPWS
interrogative pronounPWAT
interrogative determinerPWAV
interrogative adverbPWAVREL
interrogative adverb used as relativePWREL
interrogative pronoun used as relativeTRUNC
truncated form of compoundVAFIN
finite auxiliary verbVAIMP
imperative of auxiliaryVAINF
infinitive of auxiliaryVAPP
past participle of auxiliaryVMFIN
finite modal verbVMINF
infinitive of modalVMPP
past participle of auxiliaryVVFIN
finite full verbVVIMP
imperative of full verbVVINF
infinitive of full verbVVIZU
infinitive with incorporated zuVVPP
past participle of full verb
As in the French corpus, there are also combined tags such as VAFIN+PPER when a personal pronoun is agglutinated to a verb (hätti for 'hätte ich').
Italian
The Italian corpus is annotated with the TreeTagger, too, but based on the original tokens, i.e. not manually normalized.
- tt_pos: Part of Speech annotation with TreeTagger
- tt_lem: The lemma as assigned by TreeTagger
The following PoS tagset was used:
ABR
abbreviationADJ
adjectiveADV
adverbCON
conjunctionDET:def
definite articleDET:indef
indefinite articleFW
foreign wordINT
interjectionLS
list symbolNOM
nounNPR
nameNUM
numeralPON
punctuationPRE
prepositionPRE:det
preposition+articlePRO
pronounPRO:demo
demonstrative pronounPRO:indef
indefinite pronounPRO:inter
interrogative pronounPRO:pers
personal pronounPRO:poss
possessive pronounPRO:refl
reflexive pronounPRO:rela
relative pronounSENT
sentence markerSYM
symbolVER:cimp
verb conjunctive imperfectVER:cond
verb conditionalVER:cpre
verb conjunctive presentVER:futu
verb future tenseVER:geru
verb gerundVER:impe
verb imperativeVER:impf
verb imperfectVER:infi
verb infinitiveVER:pper
verb participle perfectVER:ppre
verb participle presentVER:pres
verb presentVER:refl:infi
verb reflexive infinitiveVER:remo
verb simple past