User Tools

Site Tools


01_corpus:02_preprocessing:07_normalization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
01_corpus:04_annotations:03_normalization [2019/11/06 16:35] simone01_corpus:02_preprocessing:07_normalization [2022/06/27 09:21] (current) – external edit 127.0.0.1
Line 1: Line 1:
-====== Normalization ====== +====== 1.2.7 Normalization ====== 
-Normalization is the task of "translating" non-standard language into standard language. It can be performed manually or automatically with computational linguistic tools.+Normalization is the task of "translating" non-standard language data into standard language. It can be performed manually or automatically with computational linguistics tools.
  
-In the case of our corpus, we have manually normalized some data in the Swiss German dialect, resulting in the corpus WUS_DIALOG_GSW.+In the case of our corpus, we have manually normalized some data in the Swiss German dialect, resulting in the corpus WUS_DIALOG_GSW (5 chats, 34,683 tokens).
  
-Another set of data was process automatically. You can read more about that project in: 
- 
-Ruzsics, Tatiana; Lusetti, Massimo; Göhring, Anne; Samardžić, Tanja; Stark, Elisabeth (2019): Neural Text Normalization with Adapted Decoding and PoS Features. [[https://www.cambridge.org/core/journals/natural-language-engineering/article/neural-text-normalization-with-adapted-decoding-and-pos-features/474B380A32EF96CCED1708229848F3FB|Natural Language Engineering]]. 
- 
-This data will be made available soon. 
01_corpus/02_preprocessing/07_normalization.1573054536.txt.gz · Last modified: 2022/06/27 09:21 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki