Commit Graph

31 Commits

Author SHA1 Message Date
25cb79f01b Changed the value of EPSILON, so that we can use the NUL character
(which it used to be) in a regex; Also added code to detect escaped
backslashes

Specifically, I replace an escaped backslash with a metacharacter, then
replace it back later on. This prevents problems, like detecting whether
the opening bracket is escaped in '\\[a]'.
2025-01-21 22:12:29 -05:00
47ec95f7bb Created function that returns a 'default' state 2025-01-19 21:45:07 -06:00
3f0360b9be Fixed bug where I used the 'lookaroundNumCaptureGroups' member of the wrong State struct 2025-01-09 10:39:04 -06:00
644ed15af0 Use new API for findAllMatches 2025-01-06 20:10:25 -06:00
61bced606e Added comments - certain members of State depend on the current match, should be reset 2024-12-16 22:32:22 -05:00
332c2fe5a2 Made lookarounds a little more efficient by only matching from (or to, in the case of lookbehind) the current index 2024-12-11 00:31:08 -05:00
437ca2ee57 Improved submatch tracking by storing all group indices as a part of the state, which is viewed as a 'thread' 2024-12-11 00:16:24 -05:00
11f7f1d746 Added fields to state, to determine capturing group information. 0th group refers to entire match 2024-12-09 01:05:01 -05:00
745fab9639 Clone lookaroundNFA when cloning a state; use compiled regex for
lookarounds instead of compiling a new one
2024-11-27 12:15:30 -05:00
393769f152 Accounted for last character being a newline when checking for EOS (we can be at the second-last character if the last one is a newline 2024-11-27 11:44:39 -05:00
25c333bea4 Added function to determine if a state is a lookaround 2024-11-24 15:01:06 -05:00
77d19cd84e Added lookaround-related fields to State struct, added lookaround support to checkAssertion() 2024-11-22 00:11:51 -05:00
ea64ddc88a Removed unnecessary duplication of assertion checking 2024-11-20 10:38:41 -05:00
708a9e1303 Added field to denote all characters which an 'allChars' node _shouldn't_ match (useful for invertinc character classes 2024-11-20 09:39:24 -05:00
c56d81a335 Added unicode support to dot metacharacter - it now matches _any_ unicode character (almost) 2024-11-18 16:44:43 -05:00
8a1f1dc621 Added unicode support
Replaced strings with rune-slices, which capture unicode codepoints more
accurately.
2024-11-18 10:41:50 -05:00
21142e6e13 Wrote function to clone the NFA starting at a given state, and a function to find question mark operator (a? == (a|)) 2024-11-03 14:37:38 -05:00
dca81c1796 Replaced rune-slice parameters with string parameters in functions; avoids unnecessary conversion from strings to rune-slices 2024-11-01 01:53:50 -04:00
fccd3a76f5 Wrote function to check if the assertion of a state is true 2024-10-31 17:56:04 -04:00
0736e813c1 Fixed boneheaded mistake with checking assertion types 2024-10-31 17:14:03 -04:00
1aff6e2fa4 Added a field to State, that tells me what kind of assertion (if any) it is making. Also added function to check if a state's contents contain a given value (checks assertions), and to find all matches that a state has for a character 2024-10-31 17:13:34 -04:00
3778869567 Use stateContents type to allow a state to store multiple characters 2024-10-28 17:38:43 -04:00
aee24644e9 Use new unique_append function signature 2024-10-28 09:39:37 -04:00
ae219f763a Added alternate function, removed relevant code from main; also started working on escape characters 2024-10-27 15:30:33 -04:00
bf3060b672 Used 'unique append' to ensure that a transition can only contain a given state once 2024-10-27 12:52:59 -04:00
b327143fa2 Added function for concatenation and kleene star 2024-10-27 11:19:06 -04:00
9d3bc2b804 Fixed kleene star behavior, which used to behave like a '+' 2024-10-23 08:51:49 -04:00
bc11777ad5 Fixed Kleene Star matching 2024-10-22 17:07:01 -04:00
213da40c3b Allow one state to map to multiple states with the same transition eg. ab|aa 2024-10-22 14:35:03 -04:00
8394e7867e Fixed bug with last state detection 2024-10-21 23:17:10 -04:00
82b33f3c9a First commit 2024-10-21 23:08:52 -04:00