Aadhavan Srinivasan
61bced606e
Added comments - certain members of State depend on the current match, should be reset
6 days ago
Aadhavan Srinivasan
71cab59a89
Got rid of unnecessary special case to match at end-of-string
...
Instead, I tweaked the rest of the matching function, so that a special
check isn't necessary. If we are trying to match at the end of a string,
we skip any of the actual matching and proceed straight to finding
0-length matches.
This change was made because, with the special case, capturing groups
weren't getting updated if we had an end-of-string match.
2 weeks ago
Aadhavan Srinivasan
8c8e209587
Removed return values that weren't being used
2 weeks ago
Aadhavan Srinivasan
332c2fe5a2
Made lookarounds a little more efficient by only matching from (or to, in the case of lookbehind) the current index
2 weeks ago
Aadhavan Srinivasan
3fda07280e
Added more tests
2 weeks ago
Aadhavan Srinivasan
e2b08f8d5f
Updated TODO
2 weeks ago
Aadhavan Srinivasan
84cccc73ec
Added grouping tests
2 weeks ago
Aadhavan Srinivasan
437ca2ee57
Improved submatch tracking by storing all group indices as a part of the state, which is viewed as a 'thread'
2 weeks ago
Aadhavan Srinivasan
00902944f6
Added code to match capturing groups and store into a Group (used to be MatchIndex)
2 weeks ago
Aadhavan Srinivasan
80ea262064
Updated test-case structs to reflect the name of the new type
2 weeks ago
Aadhavan Srinivasan
f5eb9c8218
Defined postfixNodes for LPAREN and RPAREN
2 weeks ago
Aadhavan Srinivasan
20fbd20994
Added helper function to expand a slice to a given length
2 weeks ago
Aadhavan Srinivasan
11f7f1d746
Added fields to state, to determine capturing group information. 0th group refers to entire match
2 weeks ago
Aadhavan Srinivasan
822d1f319f
Added initial support for capturing groups
2 weeks ago
Aadhavan Srinivasan
745fab9639
Clone lookaroundNFA when cloning a state; use compiled regex for
...
lookarounds instead of compiling a new one
4 weeks ago
Aadhavan Srinivasan
34e9aedbd6
Compile lookaround regex to avoid compiling each time we want to use it
4 weeks ago
Aadhavan Srinivasan
6208f32710
Added support for numeric ranges: <5-38> will match all numbers between 5 and 38, inclusive on both ends. Also print line number on which matches occur, if we are in printing (and single line) mode
4 weeks ago
Aadhavan Srinivasan
cbd6ea136b
If the NFA starts with an assertion, make sure it's true before doing anything else. Also, check for last-state _lookaround_ rather than just last state, before breaking (instead of aborting) when the assertion fails
4 weeks ago
Aadhavan Srinivasan
eb6a044ecf
Added angle brackets to list of special characters (which need to be escaped to be used literally
4 weeks ago
Aadhavan Srinivasan
393769f152
Accounted for last character being a newline when checking for EOS (we can be at the second-last character if the last one is a newline
4 weeks ago
Aadhavan Srinivasan
e36310b32d
Added function (and helper functions) to generate a regex that matches all numbers in a range
4 weeks ago
Aadhavan Srinivasan
298285e44c
Added more test cases
4 weeks ago
Aadhavan Srinivasan
0de3a94ce3
Fixed bug with lookaheads: f(?=f) would not match anything in 'ffa', because of the 'a' at the end of the string. Fixed by checking if there are other last states when an assertion fails, rather than immediately aborting
4 weeks ago
Aadhavan Srinivasan
fe1136c54c
Fixed bug with parentheses in lookaround regex; fixed bug with reading last line of test string (if it doesn't end in a newline)
4 weeks ago
Aadhavan Srinivasan
25c333bea4
Added function to determine if a state is a lookaround
4 weeks ago
Aadhavan Srinivasan
74c177324b
Added more test cases
4 weeks ago
Aadhavan Srinivasan
7916629c4d
Added substitute flag - substitute matched text with given text
4 weeks ago
Aadhavan Srinivasan
ee02e7575e
Added function to generate all case variations of a rune
1 month ago
Aadhavan Srinivasan
c87a4b7136
Added case-insensitve flag
1 month ago
Aadhavan Srinivasan
924e2a8dbc
Added some AI-generated test cases (llama3.1:405b)
1 month ago
Aadhavan Srinivasan
21b2d5a2a9
Added lookaround-related fields to postfixNode struct
1 month ago
Aadhavan Srinivasan
77d19cd84e
Added lookaround-related fields to State struct, added lookaround support to checkAssertion()
1 month ago
Aadhavan Srinivasan
051a8551f3
Match zero-length match at end of string, even if the start node is an assertion (end of string, lookarounds, etc.)
1 month ago
Aadhavan Srinivasan
11c0a0552f
Added support for lokarounds; parsing and adding nodes for different lookarounds
1 month ago
Aadhavan Srinivasan
c807d6664e
Changed generation of characters for non-whitespace, non-digit and non-word characters - it's basically an inverted character class now
1 month ago
Aadhavan Srinivasan
d986999001
Added more tests
1 month ago
Aadhavan Srinivasan
ea64ddc88a
Removed unnecessary duplication of assertion checking
1 month ago
Aadhavan Srinivasan
1ba871d618
Removed dotChars() function, moved notDotChars() setting to main()
1 month ago
Aadhavan Srinivasan
1a1a8f4f9c
Moved flag-checking after flag.Parse()
1 month ago
Aadhavan Srinivasan
e882f41400
Added fields to denote all the characters that an 'allChars' postfixNode _shouldn't_ represent (useful for inverting character classes)
1 month ago
Aadhavan Srinivasan
b3ee1fe5e8
Convert an inverting character class into an 'allChars' node, with the characters marked as exceptions
1 month ago
Aadhavan Srinivasan
708a9e1303
Added field to denote all characters which an 'allChars' node _shouldn't_ match (useful for invertinc character classes
1 month ago
Aadhavan Srinivasan
c694c47be7
Added flag to print match indices, and to enable multi-line mode
1 month ago
Aadhavan Srinivasan
2569f52552
Wrote toString function for MatchIndex
1 month ago
Aadhavan Srinivasan
160b2f9215
Added newline character as an escaped node
1 month ago
Aadhavan Srinivasan
992c5a9300
Replaced isAlphaNum() with isNormalChar(), which returns true if the character isn't special (also returns true for unicode characters, which the previous function didn't
1 month ago
Aadhavan Srinivasan
1e0502c6aa
Added unicode tests
1 month ago
Aadhavan Srinivasan
c56d81a335
Added unicode support to dot metacharacter - it now matches _any_ unicode character (almost)
1 month ago
Aadhavan Srinivasan
8a1f1dc621
Added unicode support
...
Replaced strings with rune-slices, which capture unicode codepoints more
accurately.
1 month ago
Aadhavan Srinivasan
805766a5ba
Added support for -l : only print lines with at least one match (or with exactly 0 matches, if -v is enabled
1 month ago