User Tools

Site Tools


01_corpus:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
01_corpus:start [2020/04/17 20:08] – ↷ Links adapted because of a move operation simone01_corpus:start [2020/04/22 12:55] simone
Line 1: Line 1:
 ====== 1. THE CORPUS ====== ====== 1. THE CORPUS ======
-The corpus consists of 617 chats that were sent in by the Swiss population in 2014 through a fixed procedure that was communicated in the press in order to get people interested. The individual chats were checked for their [[01_corpus:02_preprocessing|permission]] to use them and chats that did not have it were [[01_corpus:03_preprocessing:05_removed|removed]]. Furthermore, available [[01_corpus:03_demographics|demographic data]] were linked to the chats.+The corpus consists of 617 chats that were sent in by the Swiss population in 2014 through a fixed procedure that was communicated in the press in order to get people interested. The individual chats were checked for their [[01_corpus:02_preprocessing|permission]] to use them and chats that did not have it were removed. Furthermore, available [[01_corpus:03_demographics|demographic data]] were linked to the chats.
  
-Next processing steps comprised [[01_corpus:02_preprocessing:01_anonymization|anonymization]], the annotation of a [[01_corpus:02_preprocessing:04_languages|main language]] per chat and thus the creation of [[01_corpus:01_subcorpora|subcorpora]], application of further annotations ( for [[01_corpus:03_preprocessing:05_language_per_message|languages]], i.e. each message was annotated for its most likely language as opposed to the chat annotation performed in the first step), [[01_corpus:02_preprocessing:06_pos|part of speech annotations]], [[01_corpus:02_preprocessing:07_normalization|normalization]] for part of the dialectal Swiss German data.+Next processing steps comprised [[01_corpus:02_preprocessing:01_anonymization|anonymization]], the annotation of a [[01_corpus:02_preprocessing:04_languages|main language]] per chat and thus the creation of [[01_corpus:01_subcorpora|subcorpora]], application of further annotations (for [[01_corpus:02_preprocessing:04_languages|languages]], i.e. each message was annotated for its most likely language as opposed to the chat annotation performed in the first step), [[01_corpus:02_preprocessing:06_pos|part of speech annotations]], [[01_corpus:02_preprocessing:07_normalization|normalization]] for part of the dialectal Swiss German data.
  
  
  
01_corpus/start.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki