02_browsing:02_layers
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
02_browsing:02_layers [2020/01/06 16:59] – simone | 02_browsing:02_layers [2022/06/27 09:21] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== 2.2 Layers of information ====== | ====== 2.2 Layers of information ====== | ||
- | WhatsApp messages are built up in a hierarchy: a chat contains messages that contain tokens that contain characters. A corpus of WhatsApp chats should allow for all these layers to be queried. Additionally, | + | WhatsApp messages are built up in a hierarchy: a chat contains messages that contain tokens that contain characters. A corpus of WhatsApp chats should allow for all these layers to be queried. Additionally, |
These layers can nicely be seen when browsing results from a query: | These layers can nicely be seen when browsing results from a query: | ||
{{ : | {{ : | ||
+ | Figure 1: Representation of layers when browsing results | ||
===== Chats ===== | ===== Chats ===== | ||
- | In this example, you find the chat back as an ID (chat138) at the top in pink. If you want to see the whole chat, you see two options at the very bottom: chat in context (faster) or the whole chat (can be slow). When you click on the little | + | In this example, you find the chat back as an ID (chat138) at the top in pink in figure 1. If you want to see the whole chat, you see two options at the very bottom: |
===== Messages ===== | ===== Messages ===== | ||
- | In this pink chat, you see three selected messages in blue: | + | In the chat in figure 1, you see three selected messages in blue: |
* Message 165379: Anke adesso se vuoi | * Message 165379: Anke adesso se vuoi | ||
* Message 165380: Aeh ho solo 10 percento di batteria xo | * Message 165380: Aeh ho solo 10 percento di batteria xo | ||
* Message 165381: Ah ecco | * Message 165381: Ah ecco | ||
- | As you can see, these messages have meta data assigned to them, as well, e.g. the message ID and the speaker (these pieces of information are always available) as well as information provided by the informant such as age, mothertongue etc. | + | As you can see, these messages have meta data assigned to them as well, e.g. the message ID and the speaker (these pieces of information are always available) as well as information provided by the informant such as age, mothertongue etc. |
===== Tokens ===== | ===== Tokens ===== | ||
- | The individual tokens are annoted | + | The individual tokens are marked |
- | Tokens, too, (can) have annotations that are assigned to them. In the example shown above, | + | Tokens, too, (can) have annotations that are assigned to them. In figure 1 you have the following meta data: |
* Gloss: a normalization, | * Gloss: a normalization, | ||
- | * tt_pos: A part-of-speech annotation generated with the parser | + | * tt_pos: A part-of-speech annotation generated with [[https:// |
* tt_lem: The lemma for each token as it was created by TreeTagger. | * tt_lem: The lemma for each token as it was created by TreeTagger. | ||
Line 32: | Line 33: | ||
Examples: | Examples: | ||
- | * If you want to see the whole message 165380, your query would be //msg_id=" | + | * If you want to see the whole message 165380, your query is '' |
- | * If you want to find verbs in the present tense, your query is //tt_pos=" | + | * If you want to find verbs in the present tense, your query is '' |
To see the query-labels for the chat as well as all the labels available in a specific sub-corpus, check the information for the [[02_browsing: | To see the query-labels for the chat as well as all the labels available in a specific sub-corpus, check the information for the [[02_browsing: |
02_browsing/02_layers.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1