Table of Contents

2.1 Sub-corpora

As explained in section 1.1, you can work with either the full corpus WUS or you can select different sub-corpora. You find the list of sub-corpora in the bottom left in ANNIS.

The list of sub-corpora is also a good starting point to get information about available fields for your query, to get examples and statistics.

Please keep in mind that you also see corpora with lowercase letters in the browser (e.g. deu-rftagged, ita-tagged, roh etc.). These corpora contain data from our SMS project.

Tokens and messages per sub-corpus

Next to the name of each sub-corpus, you see the number of messages (marked as "Texts") and tokens. You can use these figures for statistics.

Please note: If you work with corpora where not all participants gave their permission to use their messages, the figure for tokens is off because messages without permission were replaced by messages like redactedQ12tokens55characters . These texts count as tokens, too. If you need statistics that depend on the number of tokens in a (sub-)corpus, you are advised to work with corpora with the extension _DEMOG.

Information about the (sub-)corpora

When you press on the small i for information to the right of each (sub-)corpus name, you find more information about the corpus. More specifically:

Figure 1: Information about a (sub-)corpus

On the right-hand side of the information window, you see which annotations are available to be queried for the selected sub-corpus.

List of chats in the sub-corpus

By clicking on the little piece of paper next to the information i in the list of sub-corpora, you get a list of all chats in the respective sub-corpus.

From here, you can click on complete chat view to view the whole chat (without any annotations). Once in this list of messages, you can alway click on an individual message ID to see that message with its annotations.

If you click on the little i at the very right of the list of chats, you see all the meta information about the respective chat.