no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== 1.2.6 Part of Speech Tagging ======
+Some sub-corpora have been annotated with Part Of Speech annotations. This concerns WUS_DIALOG_GSW, WUS_FRA,
+WUS_FRA_DEMOG, WUS_ITA, WUS_ITA_DEMOG.
+===== French =====
+The whole French corpus has been annotated with [[https://team.inria.fr/almanach/fr/melt/|MElt]] (Modified French TreeBank) using the tag set [[http://french-postaggers.tiddlyspot.com/|CC Tagset]]. Available annotations are "mftb_pos" (for part of speech) and "mftb_lem" (for the lemma). The following tags are used:
+  * ''ADJ''	adjective
+  * ''ADJWH''	interrogative adjective
+  * ''ADV''	adverb
+  * ''ADVWH''	interrogative adverb
+  * ''CC''	coordinating conjunction
+  * ''CLO''	object clitic pronoun
+  * ''CLR''	reflexive clitic pronoun
+  * ''CLS''	subject clitic pronoun
+  * ''CS''	subordinating conjunction
+  * ''DET''	determiner
+  * ''DETWH''	interrogative determiner
+  * ''ET''	foreign word
+  * ''I''	interjection
+  * ''NC''	common noun
+  * ''NPP''	proper noun
+  * ''P''	preposition
+  * ''P+D''	preposition+determiner amalgam
+  * ''P+PRO''	prepositon+pronoun amalgam
+  * ''PONCT''	punctuation mark
+  * ''PREF''	prefix
+  * ''PRO''	full pronoun
+  * ''PROREL''	relative pronoun
+  * ''PROWH''	interrogative pronoun
+  * ''V''	indicative or conditional verb form
+  * ''VIMP''	imperative verb form
+  * ''VINF''	infinitive verb form
+  * ''VPP''	past participle
+  * ''VPR''	present participle
+  * ''VS''	subjunctive verb form
+===== Swiss German dialect =====
+Five chats of the Swiss German dialectal data (34,683 tokens) have been manually normalized and annotated for Part of Speech. The according corpus is called WUS_DIALOG_GSW. Three annotations have been added to each token:
+  * gloss: The manual normalization
+  * tt_pos: Part of Speech annotation with [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] based on the manually normalized tokens.
+  * tt_lem: The lemma as assigned by TreeTagger
+The [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/stts_guide.pdf|tagset]] uses the following tags:
+  * ''ADJA''	attributive adjective (including participles used adjectivally)
+  * ''ADJD''	predicate adjective; adjective used adverbially
+  * ''ADV''	adverb (never used as attributive adjective)
+  * ''APPR''	preposition left hand part of double preposition
+  * ''APPRART''	preposition with fused article
+  * ''APPO''	postposition
+  * ''APZR''	right hand part of double preposition
+  * ''ART''	article (definite or indefinite)
+  * ''CARD''	cardinal number (words or figures); also declined
+  * ''FM''	foreign words (actual part of speech in original language may be appended, e.g. FMADV/ FM-NN)
+  * ''ITJ''	interjection
+  * ''KON''	co-ordinating conjunction
+  * ''KOKOM''	comparative conjunction or particle
+  * ''KOUI''	preposition used to introduce infinitive clause
+  * ''KOUS''	subordinating conjunction
+  * ''NA''	adjective used as noun
+  * ''NE''	names and other proper nouns
+  * ''NN''	noun (but not adjectives used as nouns)
+  * ''PAV [PROAV]''	pronominal adverb
+  * ''PAVREL''	pronominal adverb used as relative
+  * ''PDAT''	demonstrative determiner
+  * ''PDS''	demonstrative pronoun
+  * ''PIAT''	indefinite determiner (whether occurring on its own or in conjunction with another determiner)
+  * ''PIS''	indefinite pronoun
+  * ''PPER''	personal pronoun
+  * ''PRF''	reflexive pronoun
+  * ''PPOSS''	possessive pronoun
+  * ''PPOSAT''	possessive determiner
+  * ''PRELAT''	relative depending on a noun
+  * ''PRELS''	relative pronoun (i.e. forms of der or welcher)
+  * ''PTKA''	particle with adjective or adverb
+  * ''PTKANT''	answer particle
+  * ''PTKNEG''	negative particle
+  * ''PTKREL''	indeclinable relative particle
+  * ''PTKVZ''	separable prefix
+  * ''PTKZU''	infinitive particle	zu
+  * ''PWS''	interrogative pronoun
+  * ''PWAT''	interrogative determiner
+  * ''PWAV''	interrogative adverb
+  * ''PWAVREL''	interrogative adverb used as relative
+  * ''PWREL''	interrogative pronoun used as relative
+  * ''TRUNC''	truncated form of compound
+  * ''VAFIN''	finite auxiliary verb
+  * ''VAIMP''	imperative of auxiliary
+  * ''VAINF''	infinitive of auxiliary
+  * ''VAPP''	past participle of auxiliary
+  * ''VMFIN''	finite modal verb
+  * ''VMINF''	infinitive of modal
+  * ''VMPP''	past participle of auxiliary
+  * ''VVFIN''	finite full verb
+  * ''VVIMP''	imperative of full verb
+  * ''VVINF''	infinitive of full verb
+  * ''VVIZU''	infinitive with incorporated zu
+  * ''VVPP''	past participle of full verb
+As in the French corpus, there are also combined tags such as //VAFIN+PPER// when a personal pronoun is agglutinated to a verb (//hätti// for 'hätte ich').
+===== Italian =====
+The Italian corpus is annotated with the [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]], too, but based on the original tokens, i.e. not manually normalized.
+   * tt_pos: Part of Speech annotation with TreeTagger
+   * tt_lem: The lemma as assigned by TreeTagger
+The following PoS [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|tagset]] was used:
+  * ''ABR''	abbreviation
+  * ''ADJ''	adjective
+  * ''ADV''	adverb
+  * ''CON''	conjunction
+  * ''DET:def''	definite article
+  * ''DET:indef''	indefinite article
+  * ''FW''	foreign word
+  * ''INT''	interjection
+  * ''LS''	list symbol
+  * ''NOM''	noun
+  * ''NPR''	name
+  * ''NUM''	numeral
+  * ''PON''	punctuation
+  * ''PRE''	preposition
+  * ''PRE:det''	preposition+article
+  * ''PRO''	pronoun
+  * ''PRO:demo''	demonstrative pronoun
+  * ''PRO:indef''	indefinite pronoun
+  * ''PRO:inter''	interrogative pronoun
+  * ''PRO:pers''	personal pronoun
+  * ''PRO:poss''	possessive pronoun
+  * ''PRO:refl''	reflexive pronoun
+  * ''PRO:rela''	relative pronoun
+  * ''SENT''	sentence marker
+  * ''SYM''	symbol
+  * ''VER:cimp''	verb conjunctive imperfect
+  * ''VER:cond''	verb conditional
+  * ''VER:cpre''	verb conjunctive present
+  * ''VER:futu''	verb future tense
+  * ''VER:geru''	verb gerund
+  * ''VER:impe''	verb imperative
+  * ''VER:impf''	verb imperfect
+  * ''VER:infi''	verb infinitive
+  * ''VER:pper''	verb participle perfect
+  * ''VER:ppre''	verb participle present
+  * ''VER:pres''	verb present
+  * ''VER:refl:infi''	verb reflexive infinitive
+  * ''VER:remo''	verb simple past