This class provides Regular Expression matching and substitution for Synergy/DE. It requires Synergy/DE version 9.1.5 or higher.
This page documents Regex version 1.0 (Download source)
Author: Chip Camden
Words in italics indicate an instance of a class. The word corresponds to the class name, except where more than one instance is represented in the same statement. In that case a number (2, 3, etc.) is appended to the class name.
Words in normal typeface are to be taken literally (required punctuation, class name in a static reference, method name, etc.)
The symbol => is used to separate an expression (on the left) from its return value (on the right).
An ellipsis (...) indicates that the previous argument may be repeated any number of times. The description will indicate whether one instance is required.
Regex.from(object) => regex
R$(object) => regex
new Regex(object) => regex
Creates a new Regex from the string representation of object, which may be an alpha expression or any class of object. The resulting string will be compiled immediately, which may throw a RegexException if the syntax is incorrect.
If you .include "chipstips.def", then you may use R$ as a syntactic shortcut for Regex.from.
Regex.GlobalSearch => boolean
Regex.GlobalSearch = boolean
Regex.IgnoreCase => boolean
Regex.IgnoreCase = boolean
Regex.LastMatch => MatchData
Regex.match(a) => MatchData
Regex.match(a, int) => MatchData
Regex.Multiline => boolean
Regex.Multiline = boolean
Multiline controls how match (and consequently, replace as well) treat a newline character (char(10)) embedded in the string being searched. It may be initially set to true via the "m" option on the expression, or you may alter it here.
When Multiline is true, a newline character is treated as an end of line and a beginning of line, so it matches '^' and '$', but cannot be matched by any other pattern (even '.' or '\x0A'). When false, newline is treated just like any other character.
Regex.replace(a, a2) => string
Regex.replace(a, a2, int) => string
Replaces the substring of a that matches the regular expression with a2. If the GlobalSearch option is enabled (the 'g' option on the expression), then all matches will be replaced -- otherwise only the first one will be.
In the second form, int specifies the beginning position within a at which substitutions may occur.
If a2 contains any '\' characters, the following special substitutions will be performed:
Regex.ToString() => string
(a)Regex => string
A member of this class is returned by the match method, and is also available in the public member LastMatch. It describes a regular expression match, or lack thereof.
Public members:
The string from which a regular expression may be constructed must have the general form:
<delimiter><match><delimiter><options>
where:
<delimiter> is any single character except '\'. Both delimiters must match.Most characters contained within <match> must follow one another immediately in the target string in order to match. For instance, R$('/xyz/') matches 'abcdefghijklmnopqrstuvwxyz' at position 24, but it does not match 'xzy' or 'x y z'.
Some characters, however, have special meaning within a regular expression. If you want to include those characters literally, you must escape them with a preceding '\'. You can also include your delimiter in the same way -- e.g., R$('/\//') matches a '/'. Following is a list of all special characters supported by this implementation, and their meanings.
The match must end at the last character of a line of text in order to be accepted. If Multiline is true, then an embedded newline character qualifies as marking the end of a line, but the end of the string always qualifies.
Example: R$('/end$/') matches 'friend', but not 'friends'.
Parentheses can be used to override the usual operator precedence by grouping operations together. Additionally, parenthesized sub-expressions are counted from left to right (by their open parenthesis) to number them from 1 to the number of sub-expressions encountered. The text that matched each sub-expression can be accessed from the MatchData object returned from the method match by indexing it. Thus for example,
match = R$('/.*(c)/').match("abcd")
assert(match[1].start == 3)
assert(match[1].length == 1)
Additionally, in the replace method, the replacement string may contain escaped references to these sub-expressions.
If a sub-expression is completed more than once within a match, the last one wins. For overlapping sub-expression matches, the longest one wins.
Also known as a Kleene closure, this operator matches as many of the preceding expression as possible (greedy search), but can match none of them, depending on the constraints of the rest of the expression.
Example: R$('/A*B/') matches 'B', 'AB', 'ZAAB', and even 'AZB' (because
the final 'B' matches the "zero A's followed by B" case.
This operator matches at least one, but as many as possible (greedy search), of the preceding expression.
Example: R$('/A+B/') matches 'AB' and 'ZAAB', but not 'B' or 'AZB'.
This metacharacter matches any character. It's often used to skip over stuff you don't care what it is. But be careful of searches that are greedier than you intended.
Example: R$('/A.B/') matches 'AAB' and 'ACB', but not 'AB'.
R$('/.*B/') matches 'ABRACADABRA' at position 1, but the length of
the match is 9 (inclusive of the second 'B').
This operator matches one or none of the preceding expression.
Example: R$('/A?B/') matchs 'B', 'AB', 'ZAAB' (at position 3) and
even 'AZB' (also at position 3).
From this character up to the matching ']' forms a specification of
a character class. In its simplest form, it's just a list of the characters
that qualify. For example: R$('/[xyz]/') matches "x", "y" or "z".
In this form, it's synonymous with R$('/x|y|z/').
If the first character within the brackets is "^", then the sense of the
character class is reversed. That is, anything except these characters.
For example, R$('/[^abc]/') will not match any of the first three
lower-case letters of the alphabet.
The back-slash can be used to introduce special characters into the sequence,
as detailed below. Thus, R$('/[\d.]/') matches any numeric digit
or a period. Note that the period (along with most special characters) has
a literal meaning when used within a character class.
A dash ('-') can be used to specify a range of characters. For example,
R$('/[A-Z]/') matches all uppercase letters.
Once introduced, a character class becomes an expression like any other character. Thus, it can be repeated with '*', '+', or '?'. For example, '/[A-Za-z]*/' matches any number of letters.
The back-slash can be used to insert the character that follows, without special interpretation. However, there are some other characters that have special meaning when following a back-slash:
The match must begin with the first character of a line of text in order to be accepted. If Multiline is true, then an embedded newline character qualifies as marking the beginning of line, but the beginning of the string always qualifies.
Example: R$('/^front/') matches 'front-end', but not 'affront'.
This operator occurs between two expressions to specify that either one or the other is required. It has the lowest operator priority of any operator, so to prevent everything on one side or the other from being lumped together, use parentheses.
Example: R$('/c(a|u)t/') matches 'cat' and 'cut' at position 1.
But R$('/ca|ut/') matches 'cut' at position 2 ("ut").
The following features that are present in some Regular Expression parsers have not yet been implemented here: