User Tools

Site Tools


01_corpus:02_preprocessing:07_normalization

This is an old revision of the document!


Normalization

Normalization is the task of "translating" non-standard language into standard language. It can be performed manually or automatically with computational linguistic tools.

In the case of our corpus, we have manually normalized some data in the Swiss German dialect, resulting in the corpus WUS_DIALOG_GSW.

Another set of data was process automatically. You can read more about that project in: Ruzsics, Tatiana; Lusetti, Massimo; Göhring, Anne; Samardžić, Tanja; Stark, Elisabeth (2019): Neural Text Normalization with Adapted Decoding and PoS Features. Natural Language Engineering.

This data will be made available soon.

01_corpus/02_preprocessing/07_normalization.1573054523.txt.gz · Last modified: 2022/06/27 09:21 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki