Commit Graph

457 Commits

Author SHA1 Message Date
b55b80ec6c Updated TODO
I didn't like the existing capturing group implementation, so I moved
that to a separate branch. This branch does not (at the moment) any code
relating to capturing groups.
2024-11-17 21:29:18 -05:00
137ea3c746 Made findAllMatchesHelper non-recursive, added pruneIndices (improved performance) and more changes
I made findAllMatchesHelper a non-recursive function. It now only
returns the first match it finds in the string (so I should probably
rename it).

These indices are collected by findAllMatches and pruned (to
remove overlaps). The overlap function has also been rewritten, to make
it (I believe) less than O(n^2). I also used the uniq_arr type to make
checking for uniqueness O(1) instaed of O(n) (as it was with
unique_append()). This has resulted in massive performance gains.

There's been a lot of changes here, and I probably haven't documented
all of them.
2024-11-07 16:16:50 -05:00
9201ed49bd Changed type from matchIndex to MatchIndex 2024-11-07 16:12:21 -05:00
9a073aa514 Added node types for left and right parentheses 2024-11-07 15:55:37 -05:00
7d265495f5 Got rid of list for uniq_arr (O(n) deletion) and instead have separate method to create list (O(n) list creation) 2024-11-07 15:55:13 -05:00
e2e99ff6a9 Added fnunction to generate numbers in a range; added capacity to some slices to prevent unnecessary reallocations 2024-11-06 15:16:51 -05:00
8a69ea8cb7 Added unique array data structure - O(1) addition and retrieval (I think) 2024-11-06 15:15:44 -05:00
ea17251bf8 Might have made a change to improve performance 2024-11-04 08:42:26 -05:00
e8aca8606a Added test cases 2024-11-03 15:09:21 -05:00
9698c4f1d8 Fixed error in calculating word boundary (off-by-one) 2024-11-03 15:04:57 -05:00
c032dcb2ea Added more test cases 2024-11-03 15:04:19 -05:00
269e2d0e1c Updated go.mod 2024-11-03 14:38:46 -05:00
21142e6e13 Wrote function to clone the NFA starting at a given state, and a function to find question mark operator (a? == (a|)) 2024-11-03 14:37:38 -05:00
b602295bee Added support for specifying how often a postfixNode is repeated 2024-11-03 14:36:56 -05:00
1d9d1a5b81 Fixed calculation of overlapping (used to check for subset instead) 2024-11-03 14:36:23 -05:00
d8f52b8ccc Added support for numeric specifiers, moved question mark operator to its own function 2024-11-03 14:36:04 -05:00
dca81c1796 Replaced rune-slice parameters with string parameters in functions; avoids unnecessary conversion from strings to rune-slices 2024-11-01 01:53:50 -04:00
723be527fb Updated TODO 2024-10-31 17:56:45 -04:00
fccd3a76f5 Wrote function to check if the assertion of a state is true 2024-10-31 17:56:04 -04:00
315f68df12 Fixed typo 2024-10-31 17:55:41 -04:00
fd957d9518 Added more test cases 2024-10-31 17:55:07 -04:00
19dc5064c8 Made conditions for word boundary a little more relaxed 2024-10-31 17:54:45 -04:00
a19d409796 Set node type to ASSERTION if the character represents an assertion 2024-10-31 17:14:56 -04:00
0736e813c1 Fixed boneheaded mistake with checking assertion types 2024-10-31 17:14:03 -04:00
1aff6e2fa4 Added a field to State, that tells me what kind of assertion (if any) it is making. Also added function to check if a state's contents contain a given value (checks assertions), and to find all matches that a state has for a character 2024-10-31 17:13:34 -04:00
f3bf5e9740 Added function to check for word boundaries and delete an element from a slice 2024-10-31 17:09:25 -04:00
20db62c596 Got rid of function that I don't need anymore 2024-10-31 17:09:02 -04:00
360bdc8e11 Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.

I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.

I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.

Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2024-10-31 17:06:32 -04:00
8dbecde3ae Added support for detecting assertion characters; changed input so that newline isn't required 2024-10-31 16:51:53 -04:00
a752491563 Added more test cases 2024-10-31 16:47:59 -04:00
656c506aa8 Wrote function to provide correct node for escaped character 2024-10-30 09:33:52 -04:00
1bafdcdb7e Added support for inverted matches; moved escape character detection to its own function 2024-10-30 09:33:25 -04:00
5f4a6c5a3b Added constants for LBRACKET and RBRACKET 2024-10-30 09:32:50 -04:00
e6c607319c Added more tests 2024-10-30 09:32:32 -04:00
8e8e9e133f Fixed matching greediness eg. a(a|b)*a would not match 'aaa' in 'aaab' 2024-10-29 20:07:30 -04:00
a619fd24f6 Added map and reduce functions, and a function to return the difference between two sets 2024-10-29 20:06:09 -04:00
f8ee1b3200 Added more tests 2024-10-29 20:05:42 -04:00
a66e8f1c08 Concatenate every character if it is escaped 2024-10-29 20:05:30 -04:00
d8299294ed Added test cases 2024-10-29 14:41:00 -04:00
45d348e7f4 Updated TODO 2024-10-29 10:08:41 -04:00
7b815343f4 Removed exclamation mark in inverted metacharacters - had the opposite effect becasue of the way deleteFunc works 2024-10-29 10:07:55 -04:00
1a7fd12569 Added support for some escaped metacharacters 2024-10-29 10:05:39 -04:00
b8d5ea0897 Wrote function to create a character node regardless of the contents of the node 2024-10-29 10:05:01 -04:00
445a7247f8 Defined variables to provide ranges of characters for metacharacters 2024-10-29 10:04:36 -04:00
ca945c7740 Added support for character ranges and dot metacharacter 2024-10-29 00:26:11 -04:00
2af4a5f9fd Added more tests 2024-10-29 00:25:38 -04:00
76157af2b8 Wrote function to generate rune slice representing valid dot metacharacter values 2024-10-29 00:25:30 -04:00
96b3009c14 Updated TODO 2024-10-28 17:40:03 -04:00
444413e1f7 Added postfixNode type to represent a node in the postfix representation of the regex 2024-10-28 17:39:32 -04:00
74c6a2e195 Added more functions to stateContents type, removed append because I don't think I need it 2024-10-28 17:39:14 -04:00