Regular Expressions Quick Reference
Several DeltaWalker functionality areas—the file and the folder comparison filters as well as the Find/Replace dialog—leverage the power of regular expressions as a means of searching and matching text. Using regular expressions, you can express a diverse set of patterns and be very precise as to the exact text to be matched. Their wide acceptance and knowledge base coverage in the public domain makes them a preferred choice.
This section gives a brief introduction to the regular expression syntax. For additional pointers, please see the references listed in the See Also section below.
Literals
All characters but the characters specified below are interpreted as themselves, and the explicitly mentioned characters are interpreted as themselves only when escaped with a backslash (\
) character placed right before them:
\\.\[\]^$?\*+{}|()
Literal escapes
The following table lists and explains special uses of the backslash (\
) character in combination with other literals for the purpose of matching certain characters:
Construct | Matches |
---|---|
\t | The tab character |
\n | The newline (i.e. line-feed) character |
\r | The carriage-return character |
\f | The form-feed character |
\a | The bell (i.e. alert) character |
\e | The escape character |
\0n | The character with octal value 0n (0 <= n <= 7 ) |
\0nn | The character with octal value 0nn (0 <= n <= 7 ) |
\0mnn | The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7 ) |
\xhh | The character with hexadecimal value 0xhh |
\uhhhh | The character with hexadecimal value 0xhhhh |
\cx | The control character corresponding to x |
Character classes
The dot (.) character matches any character. It's the simplest and the most widely used case of the so-called character classes—regular sub-expressions with simplified syntax matching sets of characters:
Construct | Matches |
---|---|
[abc] | a, b, or c (simple class) |
[^abc] | Any character except a, b, or c (negation) |
[a-zA-Z] | a through z or A through Z, inclusive (range) |
[a-d[m-p]] | a through d, or m through p: [a-dm-p] (union) |
[a-z&&[def]] | d, e, or f (intersection) |
[a-z&&[^bc]] | a through z, except for b and c: [ad-z] (subtraction) |
[a-z&&[^m-p]] | a through z, and not m through p: [a-lq-z] (subtraction) |
\d | A digit: [0-9] |
\D | A non-digit: [^0-9] |
\s | A whitespace character: [ \t\n\x0B\f\r] |
\S | A non-whitespace character: [^\s] |
\w | A word character: [a-zA-Z_0-9] |
\W | A non-word character: [^\w] |
Boundary matchers
One of the special meanings of the ^
character has already been demonstrated as part of the syntax to define negated character classes. Its second meaning, which is also in wide use, is to denote the beginning of a line i.e. it does not match an actual character but discovers where a line starts. Other expressions signaling boundaries are:
Construct | Matches |
---|---|
$ | The end of a line |
\b | A word boundary |
\B | A non-word boundary |
\A | The beginning of the input |
\G | The end of the previous match |
\Z | The end of the input but for the final terminator, if any |
\z | The end of the input |
Quantifiers
Quantifiers enable spelling out the notion of expressions that repeat their match a certain number of times. Expressions that are to match multiple times are suffixed by the quantifiers. The following table lists forms of quantified expressions that are often used:
Construct | Matches |
---|---|
X? | X, once or not at all |
X* | X, zero or more times |
X+ | X, one or more times |
X{n} | X, exactly n times |
X{n,} | X, at least n times |
X{n,m} | X, at least n but not more than m times |
Logical alternation
When matches at a given position are possible according to different expressions, the | character is used to separate the alternative expressions. For example, the scenario of matching according to either X or Y is expressed with the following form:
X|Y
Groups
Parentheses group the elements of the regular expression into distinct sub-expressions so that quantifiers and logical alternation can be applied to them.