Wrote documentation on syntax
parent
7431b1a7b2
commit
00570f07fe
@ -1,7 +1,92 @@
|
||||
/*
|
||||
Package regex implements an NFA-based engine to search for
|
||||
regular expressions in strings. The engine does not use backtracking,
|
||||
and is therefore not vulnerable to catastrophic backtracking. The regex
|
||||
syntax supported (a variation of Golang's) is specified in the Syntax section.
|
||||
Package regex implements regular expression search, using a custom non-bracktracking engine with support for lookarounds and numeric ranges.
|
||||
|
||||
The engine relies completely on UTF-8 codepoints. As such, it is capable of matching characters
|
||||
from other languages, emojis and symbols.
|
||||
|
||||
The full syntax is specified below.
|
||||
|
||||
# Syntax
|
||||
|
||||
Single characters:
|
||||
|
||||
. Match any character. Newline matching is dependent on the RE_SINGLE_LINE flag.
|
||||
[abc] Character class - match a, b or c
|
||||
[a-z] Character range - match any character from a to z
|
||||
[^abc] Negated character class - match any character except a, b and c
|
||||
[^a-z] Negated character range - do not match any character from a to z
|
||||
\[ Match a literal '['. Backslashes can escape any character with special meaning, including another backslash.
|
||||
\452 Match the character with the octal value 452 (up to 3 digits)
|
||||
\xFF Match the character with the hex value FF (exactly 2 characters)
|
||||
\x{0000FF} Match the character with the hex value 0000FF (exactly 6 characters)
|
||||
\n Newline
|
||||
\a Bell character
|
||||
\f Form-feed character
|
||||
\r Carriage return
|
||||
\t Horizontal tab
|
||||
\v Vertical tab
|
||||
|
||||
Perl classes:
|
||||
|
||||
\d Match any digit character ([0-9])
|
||||
\D Match any non-digit character ([^0-9])
|
||||
\w Match any word character ([a-zA-Z0-9_])
|
||||
\W Match any word character ([^a-zA-Z0-9_])
|
||||
\s Match any whitespace character ([ \t\n])
|
||||
\S Match any non-whitespace character ([^ \t\n])
|
||||
|
||||
POSIX classes (inside normal character classes):
|
||||
|
||||
[:digit:] All digit characters ([0-9])
|
||||
[:upper:] All upper-case letters ([A-Z])
|
||||
[:lower:] All lower-case letters ([a-z])
|
||||
[:alpha:] All letters ([a-zA-Z])
|
||||
[:alnum:] All alphanumeric characters ([a-zA-Z0-9])
|
||||
[:xdigit:] All hexadecimal characters ([a-fA-F0-9])
|
||||
[:blank:] All blank characters ([ \t])
|
||||
[:space:] All whitespace characters ([ \t\n\r\f\v])
|
||||
[:cntrl:] All control characters ([\x00-\x1F\x7F])
|
||||
[:punct:] All punctuation characters
|
||||
[:graph:] All graphical characters ([\x21-\x7E])
|
||||
[:print:] All graphical characters + space ([\x20-\x7E])
|
||||
[:word:] All word characters (\w)
|
||||
[:ascii:] All ASCII values ([\x00-\x7F])
|
||||
|
||||
Composition:
|
||||
|
||||
def Match d, followed by e, followed by f
|
||||
x|y Match x or y (prefer longer one)
|
||||
xy|z Match xy or z
|
||||
|
||||
Repitition (always greedy, preferring more):
|
||||
|
||||
x* Match x zero or more times
|
||||
x+ Match x one or more times
|
||||
x? Match x zero or one time
|
||||
x{m,n} Match x between m and n times (inclusive)
|
||||
x{m,} Match x atleast m times
|
||||
x{,n} Match x between 0 and n times (inclusive)
|
||||
x{m} Match x exactly m times
|
||||
|
||||
Grouping:
|
||||
|
||||
(expr) Create a capturing group. The contents of the group can be retrieved with [FindAllMatches]
|
||||
x(y|z) Match x followed by y or z. Given a successful match, the contents of group 1 will include either y or z
|
||||
(?:expr) Create a non-capturing group. The contents of the group aren't saved.
|
||||
x(?:y|z) Match x followed by y or z. No groups are created.
|
||||
|
||||
Assertions:
|
||||
|
||||
^ Match at the start of the input string. If RE_MULTILINE is enabled, it also matches at the start of every line.
|
||||
$ Match at the end of the input string. If RE_MULTILINE is enabled, it also matches at the end of every line.
|
||||
\A Always match at the start of the string, regardless of RE_MULTILINE
|
||||
\z Always match at the end of the string, regardless of RE_MULTILINE
|
||||
\b Match at a word boundary (a word character followed by a non-word character, or vice-versa)
|
||||
\B Match at a non-word boundary (a word character followed by a word character, or vice-versa)
|
||||
|
||||
# Flags
|
||||
|
||||
Flags are used to change the behavior of the engine. None of them are enabled by default. They are passed as an [ReFlag] slice to [Compile].
|
||||
The list of flags, and their purpose, is provided in the type definition.
|
||||
*/
|
||||
package regex
|
||||
|
Loading…
Reference in New Issue