01_corpus:02_preprocessing:03_emojis
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
01_corpus:02_preprocessing:03_emojis [2019/12/18 09:00] – simone | 01_corpus:02_preprocessing:03_emojis [2022/06/27 09:21] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Emojis ====== | + | ====== |
- | Emojis are characters in Unicode. The application WhatsApp uses special fonts such as to have the same appearance of emojis on all operation systems. In our corpus browsers, emojis can be displayed, but they are represented in the font that is used by the user, thus, it cannot be guarantied | + | Emojis are characters in Unicode. The application WhatsApp uses special fonts such as to have the same appearance of emojis on all operation systems. In our corpus browsers, emojis can be displayed, but they are represented in the font that is used by the user, thus, it cannot be guaranteed |
+ | |||
+ | Querying emojis is not an easy task. We decided to encode them in the messages, e.g. as '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
- | Querying emojis might not always be easy. We therefor decided to encode them in texts. This emoji 😺 would e.g. become // | ||
- | * // | ||
- | * // | ||
- | * // | ||
- | You can thus query for individual emojis or for their encodings. | ||
01_corpus/02_preprocessing/03_emojis.1576656022.txt.gz · Last modified: 2022/06/27 09:21 (external edit)