Regular Expressions

The search functions in the graphical user interface and the console allow the use of regular expressions. This document provides some help on, how regular expressions are constructed. The underlying software, that evaluates the regular expressions is the class QRegExp of the Qt framework. The documentation of QRegExp is available at http://doc.qt.nokia.com/qt-maemo-4.6/qregexp.html. It provides many additional explanations, e.g. some hints for Perl users.

Creating Regular Expressions

Regular expressions consist of characters, sets of characters (e.g. ranges like A-Z), quantifiers and placeholder or wildcards. In combination these element form a pattern, that matches words (or texts) with the same structure. To provide a reference with this text, the following is in the form of a list.
Characters
Every character stands for itself, when used without any of the following constructs.

Character Sets
Sets of characters are surrounded by [] and define ranges with a dash. Sets can be negated with a ^. Then they match anything except their content.
 
Examples:
  • [abc] matches: "a", "b", or "c".
  • [A-Z] matches: all capital letters of our alphabet.
  • [^a-d] matches: all letters that are not "a","b","c","d".

Quantifiers
Quantifiers are defined by {}. They can have one or two values: {x}, {x,y}. When one value is used the quantifier matches exactly x times. Two values represent a minimal and a maximal quantity: {<min>,<max>}. The value for min or max can be left open, which will be interpreted as 0. Quantifiers are used behind the element they quantify.
 
Examples:
  • A{3} matches: "AAA", nothing else.
  • A{1,3} matches: "A", "AA", "AAA", but not "AAAA".
  • [0-9]{1,1} matches: "0","1",...,"9", but not "13".
  • A{,3} matches: " ", "A", "AA", "AAA", but not "AAAA",....
  • A{3,} matches: "AAA", "AAAA",..., but not "AA", "A", " ".
    There are also these shortcuts for some quantifiers:
  •  ?     Matches zero or one (same as {0,1}) 
  •  +     Matches one or more (same as {1,}) 
  •  *     Matches zero or more (same as {0,}) 

Wildcards
Wildcards are used to keep space for one or more arbitrary characters. "." stands for exectly one character. ".*" stands for an unlimited amount of arbitrary characters.

Word borders
The patterns match any word that contains the pattern, if you do not provide word borders. They are defined with \b. It is also possible to exclude borders with \B.
 
Examples:
  • code matches: "Unicode is nice" and "Spaghetti code is not nice"
  • \bcode\b matches: "Spaghetti code is not nice", but not "Unicode is nice".
  • \Bcode\B matches: "Unicode is nice", but not "Spaghetti code is not nice".

String borders
Usually the pattern may appear inside a string, a "text", to count as a match. If you want the regex to match only exact strings, you have to surround it with ^ and $. ^ Marks the start, $ the end.
 
Example:
  • ^code$ matches: "code", but not "Spaghetti code is not nice".

Combining
It is possible to combine regexes in one regex, by surrounding them with parenthesis and combine them with |, which means or.
 
Example:
  • \b(rail|way)\b matches "rail" and "way", but not "railway".

Replacement Characters
There are several replacements available that stand for letters, whitespaces, numbers,... To match any character with a special meaning like ^ use \ to escape it: \^ (and \\ to match \). Here is a small list of the wildcards you might want to use in this application:
  • \a         ASCII bell
  • \f         Form feed
  • \r         Carriage return
  • \t         Horizontal tab
  • \d         Any digit
  • \D         Any non-digit
  • \s         Whitspace
  • \S         Non-whitspace
  • \w         Word character(a,b,...)
  • \W         Non-word character

Assertions
Assertions allow you to specify how the text around a pattern should be like. (?= ) is called positive lookahead. It asserts, that the pattern is followed by another pattern. (?! ) is called negative lookahead. It asserts, that the pattern is not followed by another pattern.
 
Examples:
  • Spaghetti(?=\s+code) matches: "Spaghetti code", but not "Spaghetti alla bolognese"
  • Spaghetti(?!\s+code) matches: "Spaghetti alla bolognese", but not "Spaghetti code"