Commit Graph

24 Commits (11641596fa34073e8c137dd4019793ebc2d31424)

Author SHA1 Message Date
Aadhavan Srinivasan 137ea3c746 Made findAllMatchesHelper non-recursive, added pruneIndices (improved performance) and more changes
I made findAllMatchesHelper a non-recursive function. It now only
returns the first match it finds in the string (so I should probably
rename it).

These indices are collected by findAllMatches and pruned (to
remove overlaps). The overlap function has also been rewritten, to make
it (I believe) less than O(n^2). I also used the uniq_arr type to make
checking for uniqueness O(1) instaed of O(n) (as it was with
unique_append()). This has resulted in massive performance gains.

There's been a lot of changes here, and I probably haven't documented
all of them.
Aadhavan Srinivasan ea17251bf8 Might have made a change to improve performance
Aadhavan Srinivasan 1d9d1a5b81 Fixed calculation of overlapping (used to check for subset instead)
Aadhavan Srinivasan dca81c1796 Replaced rune-slice parameters with string parameters in functions; avoids unnecessary conversion from strings to rune-slices
Aadhavan Srinivasan 315f68df12 Fixed typo
Aadhavan Srinivasan 360bdc8e11 Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.

I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.

I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.

Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
Aadhavan Srinivasan 8e8e9e133f Fixed matching greediness eg. a(a|b)*a would not match 'aaa' in 'aaab'
Aadhavan Srinivasan 4f2f14212c Use contains function, since the content may have multiple characters
Aadhavan Srinivasan df6efcd1f0 Unique append to match indices (ensure match indices aren't repeated
Aadhavan Srinivasan fe5c94b4df Use new unique append to check if unique states have been added to tempStates
Aadhavan Srinivasan 13a57a4347 Stricter check for adding zero-length match at end of string
Aadhavan Srinivasan cda0dfb0cc Match empty string if start state is kleene star
Aadhavan Srinivasan 95654e3e34 Take all possible 0-states (until no more left to take) before checking if we are in an acceptable position
Aadhavan Srinivasan c9fdf5aa6c Restored old behavior with end-of-string - new one didn't seem to work well
Aadhavan Srinivasan cd2b800b04 Fixed greediness of kleene star
Aadhavan Srinivasan 139c88dd58 Started working on '+' operator
Aadhavan Srinivasan c894ee4c0d Renamed match function to 'findAllMatches', to better represent what it does
Aadhavan Srinivasan ce156c4405 Fixed kleene star matching at end of string - failed test a* and ppppppppaaaaaaaa
Aadhavan Srinivasan 9d786997df Initial support for multiple matching
Aadhavan Srinivasan 60b798d904 Working on multiple matching
Aadhavan Srinivasan 11dd6aeb7c More Kleene star fixes
Aadhavan Srinivasan 9d3bc2b804 Fixed kleene star behavior, which used to behave like a '+'
Aadhavan Srinivasan bc11777ad5 Fixed Kleene Star matching
Aadhavan Srinivasan d191686168 Rudimentary matching works