Regular Expressions
The search functions in the graphical user interface and the console allow the use of regular expressions.
This document provides some help on, how regular expressions are constructed. The underlying software, that evaluates the
regular expressions is the class QRegExp of the Qt framework. The documentation of QRegExp is available at
http://doc.qt.nokia.com/qt-maemo-4.6/qregexp.html.
It provides many additional explanations, e.g. some hints for Perl users.
Creating Regular Expressions
Regular expressions consist of characters, sets of characters (e.g. ranges like A-Z), quantifiers and
placeholder or wildcards. In combination these element form a pattern, that matches words (or texts)
with the same structure. To provide a reference with this text, the following is in the form of a list.
- Characters
- Every character stands for itself, when used without any of the following constructs.
- Character Sets
- Sets of characters are surrounded by [] and define ranges with a dash.
Sets can be negated with a ^. Then they match anything except their content.
Examples:
- [abc] matches: "a", "b", or "c".
- [A-Z] matches: all capital letters of our alphabet.
- [^a-d] matches: all letters that are not "a","b","c","d".
- Quantifiers
- Quantifiers are defined by {}. They can have one or two values: {x}, {x,y}. When one value is
used the quantifier matches exactly x times. Two values represent a minimal and a maximal quantity:
{<min>,<max>}. The value for min or max can be left open, which will be interpreted as 0.
Quantifiers are used behind the element they quantify.
Examples:
- A{3} matches: "AAA", nothing else.
- A{1,3} matches: "A", "AA", "AAA", but not "AAAA".
- [0-9]{1,1} matches: "0","1",...,"9", but not "13".
- A{,3} matches: " ", "A", "AA", "AAA", but not "AAAA",....
- A{3,} matches: "AAA", "AAAA",..., but not "AA", "A", " ".
There are also these shortcuts for some quantifiers:
? Matches zero or one (same as {0,1})
+ Matches one or more (same as {1,})
* Matches zero or more (same as {0,})
- Wildcards
- Wildcards are used to keep space for one or more arbitrary characters. "." stands for
exectly one character. ".*" stands for an unlimited amount of arbitrary characters.
- Word borders
- The patterns match any word that contains the pattern, if you do not provide word borders.
They are defined with \b. It is also possible to exclude borders with \B.
Examples:
- code matches: "Unicode is nice" and "Spaghetti code is not nice"
- \bcode\b matches: "Spaghetti code is not nice", but not "Unicode is nice".
- \Bcode\B matches: "Unicode is nice", but not "Spaghetti code is not nice".
- String borders
- Usually the pattern may appear inside a string, a "text", to count as a match.
If you want the regex to match only exact strings, you have to surround it with ^ and $.
^ Marks the start, $ the end.
Example:
- ^code$ matches: "code", but not "Spaghetti code is not nice".
- Combining
- It is possible to combine regexes in one regex, by surrounding them with parenthesis and
combine them with |, which means or.
Example:
- \b(rail|way)\b matches "rail" and "way", but not "railway".
- Replacement Characters
-
There are several replacements available that stand for letters, whitespaces, numbers,...
To match any character with a special meaning like ^ use \ to escape it: \^ (and \\ to match \).
Here is a small list of the wildcards you might want to use in this application:
- Assertions
-
Assertions allow you to specify how the text around a pattern should be like. (?= ) is called
positive lookahead. It asserts, that the pattern is followed by another pattern. (?! ) is called
negative lookahead. It asserts, that the pattern is not followed by another pattern.
Examples:
- Spaghetti(?=\s+code) matches: "Spaghetti code", but not "Spaghetti alla bolognese"
- Spaghetti(?!\s+code) matches: "Spaghetti alla bolognese", but not "Spaghetti code"