27 Commits (051a8551f3b3a6773fb38dabe43993ea788c9640)

Author SHA1 Message Date
Aadhavan Srinivasan 051a8551f3 Match zero-length match at end of string, even if the start node is an assertion (end of string, lookarounds, etc.) 1 month ago
Aadhavan Srinivasan 2569f52552 Wrote toString function for MatchIndex 1 month ago
Aadhavan Srinivasan 8a1f1dc621 Added unicode support
Replaced strings with rune-slices, which capture unicode codepoints more
accurately.
1 month ago
Aadhavan Srinivasan 137ea3c746 Made findAllMatchesHelper non-recursive, added pruneIndices (improved performance) and more changes
I made findAllMatchesHelper a non-recursive function. It now only
returns the first match it finds in the string (so I should probably
rename it).

These indices are collected by findAllMatches and pruned (to
remove overlaps). The overlap function has also been rewritten, to make
it (I believe) less than O(n^2). I also used the uniq_arr type to make
checking for uniqueness O(1) instaed of O(n) (as it was with
unique_append()). This has resulted in massive performance gains.

There's been a lot of changes here, and I probably haven't documented
all of them.
2 months ago
Aadhavan Srinivasan ea17251bf8 Might have made a change to improve performance 2 months ago
Aadhavan Srinivasan 1d9d1a5b81 Fixed calculation of overlapping (used to check for subset instead) 2 months ago
Aadhavan Srinivasan dca81c1796 Replaced rune-slice parameters with string parameters in functions; avoids unnecessary conversion from strings to rune-slices 2 months ago
Aadhavan Srinivasan 315f68df12 Fixed typo 2 months ago
Aadhavan Srinivasan 360bdc8e11 Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.

I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.

I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.

Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
Aadhavan Srinivasan 8e8e9e133f Fixed matching greediness eg. a(a|b)*a would not match 'aaa' in 'aaab' 2 months ago
Aadhavan Srinivasan 4f2f14212c Use contains function, since the content may have multiple characters 2 months ago
Aadhavan Srinivasan df6efcd1f0 Unique append to match indices (ensure match indices aren't repeated 2 months ago
Aadhavan Srinivasan fe5c94b4df Use new unique append to check if unique states have been added to tempStates 2 months ago
Aadhavan Srinivasan 13a57a4347 Stricter check for adding zero-length match at end of string 2 months ago
Aadhavan Srinivasan cda0dfb0cc Match empty string if start state is kleene star 2 months ago
Aadhavan Srinivasan 95654e3e34 Take all possible 0-states (until no more left to take) before checking if we are in an acceptable position 2 months ago
Aadhavan Srinivasan c9fdf5aa6c Restored old behavior with end-of-string - new one didn't seem to work well 2 months ago
Aadhavan Srinivasan cd2b800b04 Fixed greediness of kleene star 2 months ago
Aadhavan Srinivasan 139c88dd58 Started working on '+' operator 2 months ago
Aadhavan Srinivasan c894ee4c0d Renamed match function to 'findAllMatches', to better represent what it does 2 months ago
Aadhavan Srinivasan ce156c4405 Fixed kleene star matching at end of string - failed test a* and ppppppppaaaaaaaa 2 months ago
Aadhavan Srinivasan 9d786997df Initial support for multiple matching 2 months ago
Aadhavan Srinivasan 60b798d904 Working on multiple matching 2 months ago
Aadhavan Srinivasan 11dd6aeb7c More Kleene star fixes 2 months ago
Aadhavan Srinivasan 9d3bc2b804 Fixed kleene star behavior, which used to behave like a '+' 2 months ago
Aadhavan Srinivasan bc11777ad5 Fixed Kleene Star matching 2 months ago
Aadhavan Srinivasan d191686168 Rudimentary matching works 2 months ago