User Tools

Site Tools


02_browsing:02_layers

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
02_browsing:02_layers [2019/11/06 12:12] – [Labels] simone02_browsing:02_layers [2020/01/06 16:59] simone
Line 2: Line 2:
 WhatsApp messages are built up in a hierarchy: a chat contains messages that contain tokens that contain characters. A corpus of WhatsApp chats should allow for all these layers to be queried. Additionally, there is meta-data about the chats (e.g. number of messages) and about the messages (e.g. the timestamp when it was written) and about the informant (e.g. his/her age) and about the tokens (e.g. part of speech). This makes our corpus a rather challenging and complex endeavor.  WhatsApp messages are built up in a hierarchy: a chat contains messages that contain tokens that contain characters. A corpus of WhatsApp chats should allow for all these layers to be queried. Additionally, there is meta-data about the chats (e.g. number of messages) and about the messages (e.g. the timestamp when it was written) and about the informant (e.g. his/her age) and about the tokens (e.g. part of speech). This makes our corpus a rather challenging and complex endeavor. 
  
-These layers can nicely seen when browsing results from a query:+These layers can nicely be seen when browsing results from a query:
 {{ :02_browsing:layers.png?direct&600 |}} {{ :02_browsing:layers.png?direct&600 |}}
  
Line 21: Line 21:
 The individual tokens are annoted in green in the above example and they are aligned to the message, to which they belong. The individual tokens are annoted in green in the above example and they are aligned to the message, to which they belong.
  
-Tokens, too, (can) have meta data that is assigned to them. In the example shown above, you have the following meta data that was created by our team or by our computational linguists:+Tokens, too, (can) have annotations that are assigned to them. In the example shown above, you have the following meta data that was created by our team or by our computational linguists:
   * Gloss: a normalization, i.e. a "translation" into standard spelling. A good example here is //xo//, which was normalized as <però>.   * Gloss: a normalization, i.e. a "translation" into standard spelling. A good example here is //xo//, which was normalized as <però>.
   * tt_pos: A part-of-speech annotation generated with the parser [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]].   * tt_pos: A part-of-speech annotation generated with the parser [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]].
02_browsing/02_layers.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki