33 Commits (8d6e1a41a5b9d8ec673b50ffafb4b5c5041630a5)

Author SHA1 Message Date
Aadhavan Srinivasan 71cab59a89 Got rid of unnecessary special case to match at end-of-string
Instead, I tweaked the rest of the matching function, so that a special
check isn't necessary. If we are trying to match at the end of a string,
we skip any of the actual matching and proceed straight to finding
0-length matches.

This change was made because, with the special case, capturing groups
weren't getting updated if we had an end-of-string match.
2 weeks ago
Aadhavan Srinivasan 8c8e209587 Removed return values that weren't being used 2 weeks ago
Aadhavan Srinivasan 437ca2ee57 Improved submatch tracking by storing all group indices as a part of the state, which is viewed as a 'thread' 2 weeks ago
Aadhavan Srinivasan 00902944f6 Added code to match capturing groups and store into a Group (used to be MatchIndex) 2 weeks ago
Aadhavan Srinivasan cbd6ea136b If the NFA starts with an assertion, make sure it's true before doing anything else. Also, check for last-state _lookaround_ rather than just last state, before breaking (instead of aborting) when the assertion fails 4 weeks ago
Aadhavan Srinivasan 0de3a94ce3 Fixed bug with lookaheads: f(?=f) would not match anything in 'ffa', because of the 'a' at the end of the string. Fixed by checking if there are other last states when an assertion fails, rather than immediately aborting 4 weeks ago
Aadhavan Srinivasan 051a8551f3 Match zero-length match at end of string, even if the start node is an assertion (end of string, lookarounds, etc.) 1 month ago
Aadhavan Srinivasan 2569f52552 Wrote toString function for MatchIndex 1 month ago
Aadhavan Srinivasan 8a1f1dc621 Added unicode support
Replaced strings with rune-slices, which capture unicode codepoints more
accurately.
1 month ago
Aadhavan Srinivasan 137ea3c746 Made findAllMatchesHelper non-recursive, added pruneIndices (improved performance) and more changes
I made findAllMatchesHelper a non-recursive function. It now only
returns the first match it finds in the string (so I should probably
rename it).

These indices are collected by findAllMatches and pruned (to
remove overlaps). The overlap function has also been rewritten, to make
it (I believe) less than O(n^2). I also used the uniq_arr type to make
checking for uniqueness O(1) instaed of O(n) (as it was with
unique_append()). This has resulted in massive performance gains.

There's been a lot of changes here, and I probably haven't documented
all of them.
2 months ago
Aadhavan Srinivasan ea17251bf8 Might have made a change to improve performance 2 months ago
Aadhavan Srinivasan 1d9d1a5b81 Fixed calculation of overlapping (used to check for subset instead) 2 months ago
Aadhavan Srinivasan dca81c1796 Replaced rune-slice parameters with string parameters in functions; avoids unnecessary conversion from strings to rune-slices 2 months ago
Aadhavan Srinivasan 315f68df12 Fixed typo 2 months ago
Aadhavan Srinivasan 360bdc8e11 Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.

I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.

I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.

Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
Aadhavan Srinivasan 8e8e9e133f Fixed matching greediness eg. a(a|b)*a would not match 'aaa' in 'aaab' 2 months ago
Aadhavan Srinivasan 4f2f14212c Use contains function, since the content may have multiple characters 2 months ago
Aadhavan Srinivasan df6efcd1f0 Unique append to match indices (ensure match indices aren't repeated 2 months ago
Aadhavan Srinivasan fe5c94b4df Use new unique append to check if unique states have been added to tempStates 2 months ago
Aadhavan Srinivasan 13a57a4347 Stricter check for adding zero-length match at end of string 2 months ago
Aadhavan Srinivasan cda0dfb0cc Match empty string if start state is kleene star 2 months ago
Aadhavan Srinivasan 95654e3e34 Take all possible 0-states (until no more left to take) before checking if we are in an acceptable position 2 months ago
Aadhavan Srinivasan c9fdf5aa6c Restored old behavior with end-of-string - new one didn't seem to work well 2 months ago
Aadhavan Srinivasan cd2b800b04 Fixed greediness of kleene star 2 months ago
Aadhavan Srinivasan 139c88dd58 Started working on '+' operator 2 months ago
Aadhavan Srinivasan c894ee4c0d Renamed match function to 'findAllMatches', to better represent what it does 2 months ago
Aadhavan Srinivasan ce156c4405 Fixed kleene star matching at end of string - failed test a* and ppppppppaaaaaaaa 2 months ago
Aadhavan Srinivasan 9d786997df Initial support for multiple matching 2 months ago
Aadhavan Srinivasan 60b798d904 Working on multiple matching 2 months ago
Aadhavan Srinivasan 11dd6aeb7c More Kleene star fixes 2 months ago
Aadhavan Srinivasan 9d3bc2b804 Fixed kleene star behavior, which used to behave like a '+' 2 months ago
Aadhavan Srinivasan bc11777ad5 Fixed Kleene Star matching 2 months ago
Aadhavan Srinivasan d191686168 Rudimentary matching works 2 months ago