02_browsing:04_queries:03_regex
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
02_browsing:04_queries:02_regex [2020/04/17 21:00] – simone | 02_browsing:04_queries:03_regex [2020/04/21 11:24] – simone | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Regular Expressions ====== | + | ====== |
In order to search for spelling variants, different forms of a lemma or else, you need to formulate RegEx expressions in ANNIS. For this, you put your query in between slashes. | In order to search for spelling variants, different forms of a lemma or else, you need to formulate RegEx expressions in ANNIS. For this, you put your query in between slashes. | ||
Line 129: | Line 129: | ||
===Word boundaries=== | ===Word boundaries=== | ||
- | In ANNIS you can query on different layers. | + | In ANNIS you can query on different layers. |
- | Let us look again at the phrase | + | Let us look again at the sentence |
|the|man|manually|attached|the|tube|in|manchester| | |the|man|manually|attached|the|tube|in|manchester| | ||
Line 139: | Line 139: | ||
|the man manually attached the tube in Manchester| | |the man manually attached the tube in Manchester| | ||
- | Accordingly, | + | Accordingly, |
- | If you query for //man// on the message level, you will find nothing, because ANNIS will search for a whole message that contains only these three characters. In order to actually find the word you are looking for, you have to query for "any characters followed by the string //man// followed by any characters" | + | If you query for //man// on the message level, you will find nothing, because ANNIS will search for a whole message that contains only these three characters. In order to actually find the word you are looking for, you have to query for "any characters |
- | msg=/.*?man.*/ | + | '' |
and will find //man// but also // | and will find //man// but also // | ||
- | If you want to find only //man//, you have to query for the three letters surrounded by boundaries (ie. spaces, tabs, fullstops, commas, new-lines etc.). The string for a boundary is //\b//. The query for //man// and only //man// within a message would thus look as follows: | + | If you want to find only //man//, you have to query for the three letters surrounded by boundaries (ie. spaces, tabs, fullstops, commas, new-lines etc.). The string for a boundary is '' |
- | msg=/ | + | '' |
Line 156: | Line 156: | ||
====Quantifiers==== | ====Quantifiers==== | ||
- | Sometimes you might be looking for an expression which can be written with or without repeating letters. E.g. you might want to look for //hallo, haaallo, halooooo// | + | Sometimes you might be looking for an expression which can be written with or without repeating letters |
- | | + | |
- | * ***** an asterisk means a repetition of 0 or more times | + | |
- | | + | |
- | | + | |
Example: | Example: | ||
- | /h+a+l+o+/ | + | '' |
will find all variants of hallo | will find all variants of hallo | ||
- | |||
- | |||
- | Using quantifiers is much more capable and demanding than this. The examples given here are called //greedy//, there are also //non greedy quantifiers// | ||
- | |||
- | Hint: it you find these options too complicated, | ||
==== Alternatives==== | ==== Alternatives==== | ||
- | Above, you have seen that you can query for different letters in one spot, e.g. you can search for //man// and //men// with the expression | + | Above, you have seen that you can query for different letters in one spot, e.g. you can search for //man// and //men// with the expression |
Example: | Example: | ||
- | n(8|acht|ight|uit) | + | '' |
will look for: | will look for: | ||
- | n8 | + | * //n8// |
- | nacht | + | * //nacht// |
- | night | + | * //night// |
- | nuit | + | * //nuit// |
==== A final word==== | ==== A final word==== | ||
- | What you have read here, is only a selection of the possibilities RegEx offers. To keep things more or less simple for you, we tried to document all the features you are likely to use while omitting everything you probably will not care about. Also, there are different implementations of RegEx in different programs and they support different features | + | What you have read here is only a selection |
02_browsing/04_queries/03_regex.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1