099612ae7f
Bug fixes, changed the way I parse octal values
2025-01-20 18:04:05 -05:00
9115858261
Changed assignment of the unicode values by 1, so that EPSILON can now be 0xF0000
2025-01-20 17:08:07 -05:00
fb46ed62d9
Added tests for FindString
2025-01-19 22:56:47 -05:00
47ec95f7bb
Created function that returns a 'default' state
2025-01-19 21:45:07 -06:00
a14ab81697
Updated function names, addeed new function 'FindString' that returns the _text_ of the match
2025-01-19 21:44:15 -06:00
7056026e10
Added a new class 'CHARCLASS', which represents a character class with some other postfixNodes in it. The 'except' field now contains a list of postfixNodes rather than runes
2025-01-19 21:43:21 -06:00
b81a2f8452
Added functions to find if a character is a valid hex value and a valid octal value
2025-01-19 21:31:18 -06:00
fcdb4a8868
Added another test, changed function calls to match new names
2025-01-19 21:30:56 -06:00
3a3333b38a
New features, changed character class behavior
...
I added support for hex values (eg. \x0F), octal values (eg. \012) and
extended hex values (eg. \x{000F2A}). I also expanded the abilities of
character clsses, to include things like escaped characters (eg. [aefp\)])
and character ranges _inside_ inverted character classes (eg. [^\w] which is
functionally equivalent to [\W]).
2025-01-19 21:26:56 -06:00
4376ccb77d
Renamed function calls to use new names
2025-01-19 21:22:33 -06:00
3f0360b9be
Fixed bug where I used the 'lookaroundNumCaptureGroups' member of the wrong State struct
2025-01-09 10:39:04 -06:00
0956dddd81
Fixed bug where I checked if flag was enabled before calling flag.Parse()
2025-01-09 10:38:35 -06:00
0b84806fc4
Added 'flags' to the Compile function, instead of maintaining global state to check whether certain features were enabled
2025-01-09 10:33:56 -06:00
24fa365be1
Moved some auxiliary functions into compile.go; use new API for compiling and finding matches
2025-01-06 20:14:57 -06:00
1da3f7f0e0
Changed API for match-finding functions - take in a Reg instead of start state and numGroups separately
2025-01-06 20:14:19 -06:00
8e8067482a
Rewrote to use new API for compiling and finding matches
2025-01-06 20:12:18 -06:00
644ed15af0
Use new API for findAllMatches
2025-01-06 20:10:25 -06:00
c8613c1ba2
Major restructuring - added new type, changed return types for shuntingYard and thompson
...
I added a new function 'Compile' that calls shuntingYard and thompson. I also added
a new type 'Reg' that this function returns - it represents the starting state and contains
the number of capturing groups in the regex. I also rewrote shuntingYard and thompson
to return errors instead of panicking.
2025-01-06 20:08:24 -06:00
ddbcb309b0
Made shuntingYard return an error instead of panicking, moved it and thompson to compile.go
2025-01-06 12:29:04 -06:00
72263509d3
Rewrote behavior of '-m' flag to use the 'nth match' function from matching.go
2025-01-05 21:41:14 -06:00
4373d35216
Wrote function to find the 'n'th match of a regex
2025-01-05 21:40:53 -06:00
3fa4d0f75e
Updated TODO
2025-01-03 19:18:00 -05:00
6f9173f771
Finished support for -m flag; refactoring pending
2025-01-03 19:17:24 -05:00
8a0586d107
Added support for printing specific match indices ('-m' and '-p' flags combined)
2025-01-03 15:49:14 -06:00
13ca954072
Started working on '-m num' flag : print the <num>th match
2024-12-19 04:29:05 -05:00
85eb13287e
Updated TODO
2024-12-19 04:28:36 -05:00
e83d746ded
Added more test cases
2024-12-18 15:22:50 -05:00
98f4c9e418
Added support for non-capturing groups
2024-12-18 15:22:43 -05:00
8d6e1a41a5
Fixed bug where a repeated capturing group eg. (a){3} wouldn't capture only the last iteration, like it should
2024-12-16 22:58:39 -05:00
93a5e24c8d
Added more tests
2024-12-16 22:32:36 -05:00
61bced606e
Added comments - certain members of State depend on the current match, should be reset
2024-12-16 22:32:22 -05:00
71cab59a89
Got rid of unnecessary special case to match at end-of-string
...
Instead, I tweaked the rest of the matching function, so that a special
check isn't necessary. If we are trying to match at the end of a string,
we skip any of the actual matching and proceed straight to finding
0-length matches.
This change was made because, with the special case, capturing groups
weren't getting updated if we had an end-of-string match.
2024-12-12 14:49:45 -05:00
8c8e209587
Removed return values that weren't being used
2024-12-12 14:35:06 -05:00
332c2fe5a2
Made lookarounds a little more efficient by only matching from (or to, in the case of lookbehind) the current index
2024-12-11 00:31:08 -05:00
3fda07280e
Added more tests
2024-12-11 00:30:37 -05:00
e2b08f8d5f
Updated TODO
2024-12-11 00:17:29 -05:00
84cccc73ec
Added grouping tests
2024-12-11 00:16:35 -05:00
437ca2ee57
Improved submatch tracking by storing all group indices as a part of the state, which is viewed as a 'thread'
2024-12-11 00:16:24 -05:00
00902944f6
Added code to match capturing groups and store into a Group (used to be MatchIndex)
2024-12-09 01:28:18 -05:00
80ea262064
Updated test-case structs to reflect the name of the new type
2024-12-09 01:06:18 -05:00
f5eb9c8218
Defined postfixNodes for LPAREN and RPAREN
2024-12-09 01:05:47 -05:00
20fbd20994
Added helper function to expand a slice to a given length
2024-12-09 01:05:26 -05:00
11f7f1d746
Added fields to state, to determine capturing group information. 0th group refers to entire match
2024-12-09 01:05:01 -05:00
822d1f319f
Added initial support for capturing groups
2024-12-09 01:04:31 -05:00
745fab9639
Clone lookaroundNFA when cloning a state; use compiled regex for
...
lookarounds instead of compiling a new one
2024-11-27 12:15:30 -05:00
34e9aedbd6
Compile lookaround regex to avoid compiling each time we want to use it
2024-11-27 12:15:01 -05:00
6208f32710
Added support for numeric ranges: <5-38> will match all numbers between 5 and 38, inclusive on both ends. Also print line number on which matches occur, if we are in printing (and single line) mode
2024-11-27 11:48:04 -05:00
cbd6ea136b
If the NFA starts with an assertion, make sure it's true before doing anything else. Also, check for last-state _lookaround_ rather than just last state, before breaking (instead of aborting) when the assertion fails
2024-11-27 11:46:38 -05:00
eb6a044ecf
Added angle brackets to list of special characters (which need to be escaped to be used literally
2024-11-27 11:45:27 -05:00
393769f152
Accounted for last character being a newline when checking for EOS (we can be at the second-last character if the last one is a newline
2024-11-27 11:44:39 -05:00