From 00570f07fea5f3d928f28a1832362624d5cebccb Mon Sep 17 00:00:00 2001 From: Aadhavan Srinivasan Date: Thu, 30 Jan 2025 17:51:46 -0500 Subject: [PATCH] Wrote documentation on syntax --- regex/doc.go | 93 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 89 insertions(+), 4 deletions(-) diff --git a/regex/doc.go b/regex/doc.go index bc8f6c8..ff471eb 100644 --- a/regex/doc.go +++ b/regex/doc.go @@ -1,7 +1,92 @@ /* -Package regex implements an NFA-based engine to search for -regular expressions in strings. The engine does not use backtracking, -and is therefore not vulnerable to catastrophic backtracking. The regex -syntax supported (a variation of Golang's) is specified in the Syntax section. +Package regex implements regular expression search, using a custom non-bracktracking engine with support for lookarounds and numeric ranges. + +The engine relies completely on UTF-8 codepoints. As such, it is capable of matching characters +from other languages, emojis and symbols. + +The full syntax is specified below. + +# Syntax + +Single characters: + + . Match any character. Newline matching is dependent on the RE_SINGLE_LINE flag. + [abc] Character class - match a, b or c + [a-z] Character range - match any character from a to z + [^abc] Negated character class - match any character except a, b and c + [^a-z] Negated character range - do not match any character from a to z + \[ Match a literal '['. Backslashes can escape any character with special meaning, including another backslash. + \452 Match the character with the octal value 452 (up to 3 digits) + \xFF Match the character with the hex value FF (exactly 2 characters) + \x{0000FF} Match the character with the hex value 0000FF (exactly 6 characters) + \n Newline + \a Bell character + \f Form-feed character + \r Carriage return + \t Horizontal tab + \v Vertical tab + +Perl classes: + + \d Match any digit character ([0-9]) + \D Match any non-digit character ([^0-9]) + \w Match any word character ([a-zA-Z0-9_]) + \W Match any word character ([^a-zA-Z0-9_]) + \s Match any whitespace character ([ \t\n]) + \S Match any non-whitespace character ([^ \t\n]) + +POSIX classes (inside normal character classes): + + [:digit:] All digit characters ([0-9]) + [:upper:] All upper-case letters ([A-Z]) + [:lower:] All lower-case letters ([a-z]) + [:alpha:] All letters ([a-zA-Z]) + [:alnum:] All alphanumeric characters ([a-zA-Z0-9]) + [:xdigit:] All hexadecimal characters ([a-fA-F0-9]) + [:blank:] All blank characters ([ \t]) + [:space:] All whitespace characters ([ \t\n\r\f\v]) + [:cntrl:] All control characters ([\x00-\x1F\x7F]) + [:punct:] All punctuation characters + [:graph:] All graphical characters ([\x21-\x7E]) + [:print:] All graphical characters + space ([\x20-\x7E]) + [:word:] All word characters (\w) + [:ascii:] All ASCII values ([\x00-\x7F]) + +Composition: + + def Match d, followed by e, followed by f + x|y Match x or y (prefer longer one) + xy|z Match xy or z + +Repitition (always greedy, preferring more): + + x* Match x zero or more times + x+ Match x one or more times + x? Match x zero or one time + x{m,n} Match x between m and n times (inclusive) + x{m,} Match x atleast m times + x{,n} Match x between 0 and n times (inclusive) + x{m} Match x exactly m times + +Grouping: + + (expr) Create a capturing group. The contents of the group can be retrieved with [FindAllMatches] + x(y|z) Match x followed by y or z. Given a successful match, the contents of group 1 will include either y or z + (?:expr) Create a non-capturing group. The contents of the group aren't saved. + x(?:y|z) Match x followed by y or z. No groups are created. + +Assertions: + + ^ Match at the start of the input string. If RE_MULTILINE is enabled, it also matches at the start of every line. + $ Match at the end of the input string. If RE_MULTILINE is enabled, it also matches at the end of every line. + \A Always match at the start of the string, regardless of RE_MULTILINE + \z Always match at the end of the string, regardless of RE_MULTILINE + \b Match at a word boundary (a word character followed by a non-word character, or vice-versa) + \B Match at a non-word boundary (a word character followed by a word character, or vice-versa) + +# Flags + +Flags are used to change the behavior of the engine. None of them are enabled by default. They are passed as an [ReFlag] slice to [Compile]. +The list of flags, and their purpose, is provided in the type definition. */ package regex