Commit Graph

290 Commits

Author SHA1 Message Date
099612ae7f Bug fixes, changed the way I parse octal values 2025-01-20 18:04:05 -05:00
9115858261 Changed assignment of the unicode values by 1, so that EPSILON can now be 0xF0000 2025-01-20 17:08:07 -05:00
fb46ed62d9 Added tests for FindString 2025-01-19 22:56:47 -05:00
47ec95f7bb Created function that returns a 'default' state 2025-01-19 21:45:07 -06:00
a14ab81697 Updated function names, addeed new function 'FindString' that returns the _text_ of the match 2025-01-19 21:44:15 -06:00
7056026e10 Added a new class 'CHARCLASS', which represents a character class with some other postfixNodes in it. The 'except' field now contains a list of postfixNodes rather than runes 2025-01-19 21:43:21 -06:00
b81a2f8452 Added functions to find if a character is a valid hex value and a valid octal value 2025-01-19 21:31:18 -06:00
fcdb4a8868 Added another test, changed function calls to match new names 2025-01-19 21:30:56 -06:00
3a3333b38a New features, changed character class behavior
I added support for hex values (eg. \x0F), octal values (eg. \012) and
extended hex values (eg. \x{000F2A}). I also expanded the abilities of
character clsses, to include things like escaped characters (eg. [aefp\)])
and character ranges _inside_ inverted character classes (eg. [^\w] which is
functionally equivalent to [\W]).
2025-01-19 21:26:56 -06:00
4376ccb77d Renamed function calls to use new names 2025-01-19 21:22:33 -06:00
3f0360b9be Fixed bug where I used the 'lookaroundNumCaptureGroups' member of the wrong State struct 2025-01-09 10:39:04 -06:00
0956dddd81 Fixed bug where I checked if flag was enabled before calling flag.Parse() 2025-01-09 10:38:35 -06:00
0b84806fc4 Added 'flags' to the Compile function, instead of maintaining global state to check whether certain features were enabled 2025-01-09 10:33:56 -06:00
24fa365be1 Moved some auxiliary functions into compile.go; use new API for compiling and finding matches 2025-01-06 20:14:57 -06:00
1da3f7f0e0 Changed API for match-finding functions - take in a Reg instead of start state and numGroups separately 2025-01-06 20:14:19 -06:00
8e8067482a Rewrote to use new API for compiling and finding matches 2025-01-06 20:12:18 -06:00
644ed15af0 Use new API for findAllMatches 2025-01-06 20:10:25 -06:00
c8613c1ba2 Major restructuring - added new type, changed return types for shuntingYard and thompson
I added a new function 'Compile' that calls shuntingYard and thompson. I also added
a new type 'Reg' that this function returns - it represents the starting state and contains
the number of capturing groups in the regex. I also rewrote shuntingYard and thompson
to return errors instead of panicking.
2025-01-06 20:08:24 -06:00
ddbcb309b0 Made shuntingYard return an error instead of panicking, moved it and thompson to compile.go 2025-01-06 12:29:04 -06:00
72263509d3 Rewrote behavior of '-m' flag to use the 'nth match' function from matching.go 2025-01-05 21:41:14 -06:00
4373d35216 Wrote function to find the 'n'th match of a regex 2025-01-05 21:40:53 -06:00
3fa4d0f75e Updated TODO 2025-01-03 19:18:00 -05:00
6f9173f771 Finished support for -m flag; refactoring pending 2025-01-03 19:17:24 -05:00
8a0586d107 Added support for printing specific match indices ('-m' and '-p' flags combined) 2025-01-03 15:49:14 -06:00
13ca954072 Started working on '-m num' flag : print the <num>th match 2024-12-19 04:29:05 -05:00
85eb13287e Updated TODO 2024-12-19 04:28:36 -05:00
e83d746ded Added more test cases 2024-12-18 15:22:50 -05:00
98f4c9e418 Added support for non-capturing groups 2024-12-18 15:22:43 -05:00
8d6e1a41a5 Fixed bug where a repeated capturing group eg. (a){3} wouldn't capture only the last iteration, like it should 2024-12-16 22:58:39 -05:00
93a5e24c8d Added more tests 2024-12-16 22:32:36 -05:00
61bced606e Added comments - certain members of State depend on the current match, should be reset 2024-12-16 22:32:22 -05:00
71cab59a89 Got rid of unnecessary special case to match at end-of-string
Instead, I tweaked the rest of the matching function, so that a special
check isn't necessary. If we are trying to match at the end of a string,
we skip any of the actual matching and proceed straight to finding
0-length matches.

This change was made because, with the special case, capturing groups
weren't getting updated if we had an end-of-string match.
2024-12-12 14:49:45 -05:00
8c8e209587 Removed return values that weren't being used 2024-12-12 14:35:06 -05:00
332c2fe5a2 Made lookarounds a little more efficient by only matching from (or to, in the case of lookbehind) the current index 2024-12-11 00:31:08 -05:00
3fda07280e Added more tests 2024-12-11 00:30:37 -05:00
e2b08f8d5f Updated TODO 2024-12-11 00:17:29 -05:00
84cccc73ec Added grouping tests 2024-12-11 00:16:35 -05:00
437ca2ee57 Improved submatch tracking by storing all group indices as a part of the state, which is viewed as a 'thread' 2024-12-11 00:16:24 -05:00
00902944f6 Added code to match capturing groups and store into a Group (used to be MatchIndex) 2024-12-09 01:28:18 -05:00
80ea262064 Updated test-case structs to reflect the name of the new type 2024-12-09 01:06:18 -05:00
f5eb9c8218 Defined postfixNodes for LPAREN and RPAREN 2024-12-09 01:05:47 -05:00
20fbd20994 Added helper function to expand a slice to a given length 2024-12-09 01:05:26 -05:00
11f7f1d746 Added fields to state, to determine capturing group information. 0th group refers to entire match 2024-12-09 01:05:01 -05:00
822d1f319f Added initial support for capturing groups 2024-12-09 01:04:31 -05:00
745fab9639 Clone lookaroundNFA when cloning a state; use compiled regex for
lookarounds instead of compiling a new one
2024-11-27 12:15:30 -05:00
34e9aedbd6 Compile lookaround regex to avoid compiling each time we want to use it 2024-11-27 12:15:01 -05:00
6208f32710 Added support for numeric ranges: <5-38> will match all numbers between 5 and 38, inclusive on both ends. Also print line number on which matches occur, if we are in printing (and single line) mode 2024-11-27 11:48:04 -05:00
cbd6ea136b If the NFA starts with an assertion, make sure it's true before doing anything else. Also, check for last-state _lookaround_ rather than just last state, before breaking (instead of aborting) when the assertion fails 2024-11-27 11:46:38 -05:00
eb6a044ecf Added angle brackets to list of special characters (which need to be escaped to be used literally 2024-11-27 11:45:27 -05:00
393769f152 Accounted for last character being a newline when checking for EOS (we can be at the second-last character if the last one is a newline 2024-11-27 11:44:39 -05:00