4 Commits

2 changed files with 39 additions and 7 deletions

View File

@@ -1110,10 +1110,11 @@ func thompson(re []postfixNode) (Reg, error) {
}
// Compiles the given regular expression into a Reg type, suitable for use with the
// matching functions. The second return value is non-nil if a compilation error has
// occured. As such, the error value must be checked before using the Reg returned by this function.
// The second parameter is an optional list of flags, passed to the parsing function shuntingYard.
// Compile compiles the given regular expression into a [Reg].
//
// An error value != nil indicates that the regex was invalid; the error message should provide
// detailed information on the nature of the error.
// The second parameter is a sequence of zero or more [ReFlag] values, that modify the behavior of the regex.
func Compile(re string, flags ...ReFlag) (Reg, error) {
nodes, err := shuntingYard(re, flags...)
if err != nil {
@@ -1125,3 +1126,12 @@ func Compile(re string, flags ...ReFlag) (Reg, error) {
}
return reg, nil
}
// MustCompile panicks if Compile returns an error. They are identical in all other respects.
func MustCompile(re string, flags ...ReFlag) Reg {
reg, err := Compile(re, flags...)
if err != nil {
panic(err)
}
return reg
}

View File

@@ -95,9 +95,31 @@ Numeric ranges:
<x-y> Match any number from x to y (inclusive) (x and y must be positive numbers)
# Flags
# Key Differences with regexp
Flags are used to change the behavior of the engine. None of them are enabled by default. They are passed as variadic arguments to [Compile].
The list of flags is provided in the type definition for [ReFlag].
The engine and the API differ from [regexp] in a number of ways, some of them very subtle.
The key differences are mentioned below.
1. Greediness:
This engine does not support non-greedy operators. All operators are always greedy in nature, and will try
to match as much as they can, while still allowing for a successful match. For example, given the regex:
y*y
The engine will match as many 'y's as it can, while still allowing the trailing 'y' to be matched.
Another, more subtle example is the following regex:
x|xx
While the stdlib implementation (and most other engines) will prefer matching the first item of the alternation,
this engine will _always_ go for the longest possible match, regardless of the order of the alternation.
2. Byte-slices and runes:
My engine does not support byte-slices. When a matching function receives a string, it converts it into a
rune-slice to iterate through it. While this has some space overhead, the convenience of built-in unicode
support made the tradeoff worth it.
*/
package regex