regex

Commit Graph

Author	SHA1	Message	Date
Aadhavan Srinivasan	71cab59a89	Got rid of unnecessary special case to match at end-of-string Instead, I tweaked the rest of the matching function, so that a special check isn't necessary. If we are trying to match at the end of a string, we skip any of the actual matching and proceed straight to finding 0-length matches. This change was made because, with the special case, capturing groups weren't getting updated if we had an end-of-string match.	2 weeks ago
Aadhavan Srinivasan	8c8e209587	Removed return values that weren't being used	2 weeks ago
Aadhavan Srinivasan	437ca2ee57	Improved submatch tracking by storing all group indices as a part of the state, which is viewed as a 'thread'	2 weeks ago
Aadhavan Srinivasan	00902944f6	Added code to match capturing groups and store into a Group (used to be MatchIndex)	2 weeks ago
Aadhavan Srinivasan	cbd6ea136b	If the NFA starts with an assertion, make sure it's true before doing anything else. Also, check for last-state _lookaround_ rather than just last state, before breaking (instead of aborting) when the assertion fails	4 weeks ago
Aadhavan Srinivasan	0de3a94ce3	Fixed bug with lookaheads: f(?=f) would not match anything in 'ffa', because of the 'a' at the end of the string. Fixed by checking if there are other last states when an assertion fails, rather than immediately aborting	4 weeks ago
Aadhavan Srinivasan	051a8551f3	Match zero-length match at end of string, even if the start node is an assertion (end of string, lookarounds, etc.)	1 month ago
Aadhavan Srinivasan	2569f52552	Wrote toString function for MatchIndex	1 month ago
Aadhavan Srinivasan	8a1f1dc621	Added unicode support Replaced strings with rune-slices, which capture unicode codepoints more accurately.	1 month ago
Aadhavan Srinivasan	137ea3c746	Made findAllMatchesHelper non-recursive, added pruneIndices (improved performance) and more changes I made findAllMatchesHelper a non-recursive function. It now only returns the first match it finds in the string (so I should probably rename it). These indices are collected by findAllMatches and pruned (to remove overlaps). The overlap function has also been rewritten, to make it (I believe) less than O(n^2). I also used the uniq_arr type to make checking for uniqueness O(1) instaed of O(n) (as it was with unique_append()). This has resulted in massive performance gains. There's been a lot of changes here, and I probably haven't documented all of them.	2 months ago
Aadhavan Srinivasan	ea17251bf8	Might have made a change to improve performance	2 months ago
Aadhavan Srinivasan	1d9d1a5b81	Fixed calculation of overlapping (used to check for subset instead)	2 months ago
Aadhavan Srinivasan	dca81c1796	Replaced rune-slice parameters with string parameters in functions; avoids unnecessary conversion from strings to rune-slices	2 months ago
Aadhavan Srinivasan	315f68df12	Fixed typo	2 months ago
Aadhavan Srinivasan	360bdc8e11	Big rewrite - assertion handling, zero-match fixes, change in recursive calls I added support for transitions. I wrote a function to determine if a given state has transitions for a character at a given point in the string. This helps me check if the current state has an assertion, and take actions based on that. I also fixed zero-length matching (almost, see todo.txt). It works for nearly all cases I could think of, although I still need to write more tests. I wrote a function to check if zero-length matches are possible with a given state. I also changed the way recursive calls work. Rather than passing a modified string, the function stores the location in the input string. This location is updated with each call to the function. Finally, the function now increments the offset by 1 instead of incrementing by the length of the longest match. This leads to a bit of overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are all stored. To fix this, I wrote (and used) a function to check if a match overlaps with any matches in a slice.	2 months ago
Aadhavan Srinivasan	8e8e9e133f	Fixed matching greediness eg. a(a\|b)*a would not match 'aaa' in 'aaab'	2 months ago
Aadhavan Srinivasan	4f2f14212c	Use contains function, since the content may have multiple characters	2 months ago
Aadhavan Srinivasan	df6efcd1f0	Unique append to match indices (ensure match indices aren't repeated	2 months ago
Aadhavan Srinivasan	fe5c94b4df	Use new unique append to check if unique states have been added to tempStates	2 months ago
Aadhavan Srinivasan	13a57a4347	Stricter check for adding zero-length match at end of string	2 months ago
Aadhavan Srinivasan	cda0dfb0cc	Match empty string if start state is kleene star	2 months ago
Aadhavan Srinivasan	95654e3e34	Take all possible 0-states (until no more left to take) before checking if we are in an acceptable position	2 months ago
Aadhavan Srinivasan	c9fdf5aa6c	Restored old behavior with end-of-string - new one didn't seem to work well	2 months ago
Aadhavan Srinivasan	cd2b800b04	Fixed greediness of kleene star	2 months ago
Aadhavan Srinivasan	139c88dd58	Started working on '+' operator	2 months ago
Aadhavan Srinivasan	c894ee4c0d	Renamed match function to 'findAllMatches', to better represent what it does	2 months ago
Aadhavan Srinivasan	ce156c4405	Fixed kleene star matching at end of string - failed test a* and ppppppppaaaaaaaa	2 months ago
Aadhavan Srinivasan	9d786997df	Initial support for multiple matching	2 months ago
Aadhavan Srinivasan	60b798d904	Working on multiple matching	2 months ago
Aadhavan Srinivasan	11dd6aeb7c	More Kleene star fixes	2 months ago
Aadhavan Srinivasan	9d3bc2b804	Fixed kleene star behavior, which used to behave like a '+'	2 months ago
Aadhavan Srinivasan	bc11777ad5	Fixed Kleene Star matching	2 months ago
Aadhavan Srinivasan	d191686168	Rudimentary matching works	2 months ago

33 Commits (93a5e24c8ded4d59bec27ec693bc30a54e28a077)