Wrote documentation on syntax

2025-01-30 17:51:46 -05:00
parent 7431b1a7b2
commit 00570f07fe
1 changed files with 89 additions and 4 deletions
--- a/regex/doc.go
+++ b/regex/doc.go
@@ -1,7 +1,92 @@
 /*
-Package regex implements an NFA-based engine to search for
-regular expressions in strings. The engine does not use backtracking,
-and is therefore not vulnerable to catastrophic backtracking. The regex
-syntax supported (a variation of Golang's) is specified in the Syntax section.
+Package regex implements regular expression search, using a custom non-bracktracking engine with support for lookarounds and numeric ranges.
+
+The engine relies completely on UTF-8 codepoints. As such, it is capable of matching characters
+from other languages, emojis and symbols.
+
+The full syntax is specified below.
+
+# Syntax
+
+Single characters:
+
+	.				Match any character. Newline matching is dependent on the RE_SINGLE_LINE flag.
+	[abc]			Character class - match a, b or c
+	[a-z]			Character range - match any character from a to z
+	[^abc]			Negated character class - match any character except a, b and c
+	[^a-z]			Negated character range - do not match any character from a to z
+	\[				Match a literal '['. Backslashes can escape any character with special meaning, including another backslash.
+	\452			Match the character with the octal value 452 (up to 3 digits)
+	\xFF			Match the character with the hex value FF (exactly 2 characters)
+	\x{0000FF}		Match the character with the hex value 0000FF (exactly 6 characters)
+	\n				Newline
+	\a				Bell character
+	\f				Form-feed character
+	\r				Carriage return
+	\t				Horizontal tab
+	\v				Vertical tab
+
+Perl classes:
+
+	\d				Match any digit character ([0-9])
+	\D				Match any non-digit character ([^0-9])
+	\w				Match any word character ([a-zA-Z0-9_])
+	\W				Match any word character ([^a-zA-Z0-9_])
+	\s				Match any whitespace character ([ \t\n])
+	\S				Match any non-whitespace character ([^ \t\n])
+
+POSIX classes (inside normal character classes):
+
+	[:digit:]		All digit characters ([0-9])
+	[:upper:]		All upper-case letters ([A-Z])
+	[:lower:]		All lower-case letters ([a-z])
+	[:alpha:]		All letters ([a-zA-Z])
+	[:alnum:]		All alphanumeric characters ([a-zA-Z0-9])
+	[:xdigit:]		All hexadecimal characters ([a-fA-F0-9])
+	[:blank:]		All blank characters ([ \t])
+	[:space:]		All whitespace characters ([ \t\n\r\f\v])
+	[:cntrl:]		All control characters ([\x00-\x1F\x7F])
+	[:punct:]		All punctuation characters
+	[:graph:]		All graphical characters ([\x21-\x7E])
+	[:print:]		All graphical characters + space ([\x20-\x7E])
+	[:word:]		All word characters (\w)
+	[:ascii:]		All ASCII values ([\x00-\x7F])
+
+Composition:
+
+	def				Match d, followed by e, followed by f
+	x|y				Match x or y (prefer longer one)
+	xy|z			Match xy or z
+
+Repitition (always greedy, preferring more):
+
+	x*				Match x zero or more times
+	x+				Match x one or more times
+	x?				Match x zero or one time
+	x{m,n}			Match x between m and n times (inclusive)
+	x{m,}			Match x atleast m times
+	x{,n}			Match x between 0 and n times (inclusive)
+	x{m}			Match x exactly m times
+
+Grouping:
+
+	(expr)			Create a capturing group. The contents of the group can be retrieved with [FindAllMatches]
+	x(y|z)			Match x followed by y or z. Given a successful match, the contents of group 1 will include either y or z
+	(?:expr)		Create a non-capturing group. The contents of the group aren't saved.
+	x(?:y|z)		Match x followed by y or z. No groups are created.
+
+Assertions:
+
+	^				Match at the start of the input string. If RE_MULTILINE is enabled, it also matches at the start of every line.
+	$				Match at the end of the input string. If RE_MULTILINE is enabled, it also matches at the end of every line.
+	\A				Always match at the start of the string, regardless of RE_MULTILINE
+	\z				Always match at the end of the string, regardless of RE_MULTILINE
+	\b				Match at a word boundary (a word character followed by a non-word character, or vice-versa)
+	\B				Match at a non-word boundary (a word character followed by a word character, or vice-versa)
+
+# Flags
+
+Flags are used to change the behavior of the engine. None of them are enabled by default. They are passed as an [ReFlag] slice to [Compile].
+The list of flags, and their purpose, is provided in the type definition.
 */
 package regex