User Tools

Site Tools


01_corpus:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
01_corpus:start [2020/04/22 12:39] simone01_corpus:start [2020/04/22 12:55] simone
Line 2: Line 2:
 The corpus consists of 617 chats that were sent in by the Swiss population in 2014 through a fixed procedure that was communicated in the press in order to get people interested. The individual chats were checked for their [[01_corpus:02_preprocessing|permission]] to use them and chats that did not have it were removed. Furthermore, available [[01_corpus:03_demographics|demographic data]] were linked to the chats. The corpus consists of 617 chats that were sent in by the Swiss population in 2014 through a fixed procedure that was communicated in the press in order to get people interested. The individual chats were checked for their [[01_corpus:02_preprocessing|permission]] to use them and chats that did not have it were removed. Furthermore, available [[01_corpus:03_demographics|demographic data]] were linked to the chats.
  
-Next processing steps comprised [[01_corpus:02_preprocessing:01_anonymization|anonymization]], the annotation of a [[01_corpus:02_preprocessing:04_languages|main language]] per chat and thus the creation of [[01_corpus:01_subcorpora|subcorpora]], application of further annotations ( for [[01_corpus:03_preprocessing:05_language_per_message|languages]], i.e. each message was annotated for its most likely language as opposed to the chat annotation performed in the first step), [[01_corpus:02_preprocessing:06_pos|part of speech annotations]], [[01_corpus:02_preprocessing:07_normalization|normalization]] for part of the dialectal Swiss German data.+Next processing steps comprised [[01_corpus:02_preprocessing:01_anonymization|anonymization]], the annotation of a [[01_corpus:02_preprocessing:04_languages|main language]] per chat and thus the creation of [[01_corpus:01_subcorpora|subcorpora]], application of further annotations (for [[01_corpus:02_preprocessing:04_languages|languages]], i.e. each message was annotated for its most likely language as opposed to the chat annotation performed in the first step), [[01_corpus:02_preprocessing:06_pos|part of speech annotations]], [[01_corpus:02_preprocessing:07_normalization|normalization]] for part of the dialectal Swiss German data.
  
  
  
01_corpus/start.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki