Updated documentation

implementPCREMatchingRules
Aadhavan Srinivasan 4 weeks ago
parent c577064977
commit eddd2ae700

@ -4,6 +4,8 @@ Package regex implements regular expression search, using a custom non-bracktrac
The engine relies completely on UTF-8 codepoints. As such, it is capable of matching characters
from other languages, emojis and symbols.
The API and regex syntax are largely compatible with that of the stdlib's [regexp], with a few key differences (see 'Key Differences with regexp').
The full syntax is specified below.
# Syntax
@ -55,8 +57,8 @@ POSIX classes (inside normal character classes):
Composition:
def Match d, followed by e, followed by f
x|y Match x or y (prefer longer one)
xy|z Match xy or z
x|y Match x or y (prefer x)
xy|z Match xy or z (prefer xy)
Repitition (always greedy, preferring more):
@ -94,10 +96,11 @@ Lookarounds:
Numeric ranges:
<x-y> Match any number from x to y (inclusive) (x and y must be positive numbers)
\<x Match a literal '<' followed by x
# Key Differences with regexp
The engine and the API differ from [regexp] in a number of ways, some of them very subtle.
The engine and the API differ from [regexp] in a few ways, some of them very subtle.
The key differences are mentioned below.
1. Greediness:
@ -132,7 +135,7 @@ Rather than using primitives for return values, my engine defines two types that
values: a [Group] represents a capturing group, and a [Match] represents a list of groups.
[regexp] specifies a regular expression that gives a list of all the matching functions that it supports. The
equivalent expression for this engine is:
equivalent expression for this engine is shown below. Note that 'Index' is the default.
Find(All)?(String)?(Submatch)?
@ -140,7 +143,7 @@ equivalent expression for this engine is:
If a function contains 'All' it returns all matches instead of just the leftmost one.
If a function contains 'String' it returns the matched text, rather than the indices.
If a function contains 'String' it returns the matched text, rather than the index in the string.
If a function contains 'Submatch' it returns the match, including all submatches found by
capturing groups.
@ -156,5 +159,20 @@ and the input string:
The 0th group would contain 'xy' and the 1st group would contain 'y'. Any matching function without 'Submatch' in its name
returns the 0-group.
# Feature Differences
The following features from [regexp] are (currently) NOT supported:
1. Named capturing groups
2. Non-greedy operators
3. Unicode character classes
4. Embedded flags (flags are passed as arguments to [Compile])
5. Literal text with \Q ... \E
The following features are not available in [regexp], but are supported in my engine:
1. Lookarounds
2. Numeric ranges
The goal is to shorten the first list, and expand the second.
*/
package regex

Loading…
Cancel
Save