|
|
|
@ -4,6 +4,8 @@ Package regex implements regular expression search, using a custom non-bracktrac
|
|
|
|
|
The engine relies completely on UTF-8 codepoints. As such, it is capable of matching characters
|
|
|
|
|
from other languages, emojis and symbols.
|
|
|
|
|
|
|
|
|
|
The API and regex syntax are largely compatible with that of the stdlib's [regexp], with a few key differences (see 'Key Differences with regexp').
|
|
|
|
|
|
|
|
|
|
The full syntax is specified below.
|
|
|
|
|
|
|
|
|
|
# Syntax
|
|
|
|
@ -55,8 +57,8 @@ POSIX classes (inside normal character classes):
|
|
|
|
|
Composition:
|
|
|
|
|
|
|
|
|
|
def Match d, followed by e, followed by f
|
|
|
|
|
x|y Match x or y (prefer longer one)
|
|
|
|
|
xy|z Match xy or z
|
|
|
|
|
x|y Match x or y (prefer x)
|
|
|
|
|
xy|z Match xy or z (prefer xy)
|
|
|
|
|
|
|
|
|
|
Repitition (always greedy, preferring more):
|
|
|
|
|
|
|
|
|
@ -94,10 +96,11 @@ Lookarounds:
|
|
|
|
|
Numeric ranges:
|
|
|
|
|
|
|
|
|
|
<x-y> Match any number from x to y (inclusive) (x and y must be positive numbers)
|
|
|
|
|
\<x Match a literal '<' followed by x
|
|
|
|
|
|
|
|
|
|
# Key Differences with regexp
|
|
|
|
|
|
|
|
|
|
The engine and the API differ from [regexp] in a number of ways, some of them very subtle.
|
|
|
|
|
The engine and the API differ from [regexp] in a few ways, some of them very subtle.
|
|
|
|
|
The key differences are mentioned below.
|
|
|
|
|
|
|
|
|
|
1. Greediness:
|
|
|
|
@ -132,7 +135,7 @@ Rather than using primitives for return values, my engine defines two types that
|
|
|
|
|
values: a [Group] represents a capturing group, and a [Match] represents a list of groups.
|
|
|
|
|
|
|
|
|
|
[regexp] specifies a regular expression that gives a list of all the matching functions that it supports. The
|
|
|
|
|
equivalent expression for this engine is:
|
|
|
|
|
equivalent expression for this engine is shown below. Note that 'Index' is the default.
|
|
|
|
|
|
|
|
|
|
Find(All)?(String)?(Submatch)?
|
|
|
|
|
|
|
|
|
@ -140,7 +143,7 @@ equivalent expression for this engine is:
|
|
|
|
|
|
|
|
|
|
If a function contains 'All' it returns all matches instead of just the leftmost one.
|
|
|
|
|
|
|
|
|
|
If a function contains 'String' it returns the matched text, rather than the indices.
|
|
|
|
|
If a function contains 'String' it returns the matched text, rather than the index in the string.
|
|
|
|
|
|
|
|
|
|
If a function contains 'Submatch' it returns the match, including all submatches found by
|
|
|
|
|
capturing groups.
|
|
|
|
@ -156,5 +159,20 @@ and the input string:
|
|
|
|
|
|
|
|
|
|
The 0th group would contain 'xy' and the 1st group would contain 'y'. Any matching function without 'Submatch' in its name
|
|
|
|
|
returns the 0-group.
|
|
|
|
|
|
|
|
|
|
# Feature Differences
|
|
|
|
|
|
|
|
|
|
The following features from [regexp] are (currently) NOT supported:
|
|
|
|
|
1. Named capturing groups
|
|
|
|
|
2. Non-greedy operators
|
|
|
|
|
3. Unicode character classes
|
|
|
|
|
4. Embedded flags (flags are passed as arguments to [Compile])
|
|
|
|
|
5. Literal text with \Q ... \E
|
|
|
|
|
|
|
|
|
|
The following features are not available in [regexp], but are supported in my engine:
|
|
|
|
|
1. Lookarounds
|
|
|
|
|
2. Numeric ranges
|
|
|
|
|
|
|
|
|
|
The goal is to shorten the first list, and expand the second.
|
|
|
|
|
*/
|
|
|
|
|
package regex
|
|
|
|
|