Commit Graph

115 Commits

Author SHA1 Message Date
992c5a9300 Replaced isAlphaNum() with isNormalChar(), which returns true if the character isn't special (also returns true for unicode characters, which the previous function didn't 2024-11-20 00:24:43 -05:00
1e0502c6aa Added unicode tests 2024-11-20 00:23:57 -05:00
c56d81a335 Added unicode support to dot metacharacter - it now matches _any_ unicode character (almost) 2024-11-18 16:44:43 -05:00
8a1f1dc621 Added unicode support
Replaced strings with rune-slices, which capture unicode codepoints more
accurately.
2024-11-18 10:41:50 -05:00
805766a5ba Added support for -l : only print lines with at least one match (or with exactly 0 matches, if -v is enabled 2024-11-18 10:02:34 -05:00
dcd712dceb Added support for -o flag: only print matching content 2024-11-18 09:36:16 -05:00
f2b8812b05 Added support for -v flag, to invert which values are printed in color. Also got rid of unecessary 'else' clause 2024-11-17 22:19:55 -05:00
11641596fa Read multiple lines from stdin and apply regex to each one; Convert the array of matchIndex structs into a flat array of indices; speeds up process of checking if we have to print a character in color 2024-11-17 21:49:11 -05:00
b55b80ec6c Updated TODO
I didn't like the existing capturing group implementation, so I moved
that to a separate branch. This branch does not (at the moment) any code
relating to capturing groups.
2024-11-17 21:29:18 -05:00
137ea3c746 Made findAllMatchesHelper non-recursive, added pruneIndices (improved performance) and more changes
I made findAllMatchesHelper a non-recursive function. It now only
returns the first match it finds in the string (so I should probably
rename it).

These indices are collected by findAllMatches and pruned (to
remove overlaps). The overlap function has also been rewritten, to make
it (I believe) less than O(n^2). I also used the uniq_arr type to make
checking for uniqueness O(1) instaed of O(n) (as it was with
unique_append()). This has resulted in massive performance gains.

There's been a lot of changes here, and I probably haven't documented
all of them.
2024-11-07 16:16:50 -05:00
9201ed49bd Changed type from matchIndex to MatchIndex 2024-11-07 16:12:21 -05:00
9a073aa514 Added node types for left and right parentheses 2024-11-07 15:55:37 -05:00
7d265495f5 Got rid of list for uniq_arr (O(n) deletion) and instead have separate method to create list (O(n) list creation) 2024-11-07 15:55:13 -05:00
e2e99ff6a9 Added fnunction to generate numbers in a range; added capacity to some slices to prevent unnecessary reallocations 2024-11-06 15:16:51 -05:00
8a69ea8cb7 Added unique array data structure - O(1) addition and retrieval (I think) 2024-11-06 15:15:44 -05:00
ea17251bf8 Might have made a change to improve performance 2024-11-04 08:42:26 -05:00
e8aca8606a Added test cases 2024-11-03 15:09:21 -05:00
9698c4f1d8 Fixed error in calculating word boundary (off-by-one) 2024-11-03 15:04:57 -05:00
c032dcb2ea Added more test cases 2024-11-03 15:04:19 -05:00
269e2d0e1c Updated go.mod 2024-11-03 14:38:46 -05:00
21142e6e13 Wrote function to clone the NFA starting at a given state, and a function to find question mark operator (a? == (a|)) 2024-11-03 14:37:38 -05:00
b602295bee Added support for specifying how often a postfixNode is repeated 2024-11-03 14:36:56 -05:00
1d9d1a5b81 Fixed calculation of overlapping (used to check for subset instead) 2024-11-03 14:36:23 -05:00
d8f52b8ccc Added support for numeric specifiers, moved question mark operator to its own function 2024-11-03 14:36:04 -05:00
dca81c1796 Replaced rune-slice parameters with string parameters in functions; avoids unnecessary conversion from strings to rune-slices 2024-11-01 01:53:50 -04:00
723be527fb Updated TODO 2024-10-31 17:56:45 -04:00
fccd3a76f5 Wrote function to check if the assertion of a state is true 2024-10-31 17:56:04 -04:00
315f68df12 Fixed typo 2024-10-31 17:55:41 -04:00
fd957d9518 Added more test cases 2024-10-31 17:55:07 -04:00
19dc5064c8 Made conditions for word boundary a little more relaxed 2024-10-31 17:54:45 -04:00
a19d409796 Set node type to ASSERTION if the character represents an assertion 2024-10-31 17:14:56 -04:00
0736e813c1 Fixed boneheaded mistake with checking assertion types 2024-10-31 17:14:03 -04:00
1aff6e2fa4 Added a field to State, that tells me what kind of assertion (if any) it is making. Also added function to check if a state's contents contain a given value (checks assertions), and to find all matches that a state has for a character 2024-10-31 17:13:34 -04:00
f3bf5e9740 Added function to check for word boundaries and delete an element from a slice 2024-10-31 17:09:25 -04:00
20db62c596 Got rid of function that I don't need anymore 2024-10-31 17:09:02 -04:00
360bdc8e11 Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.

I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.

I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.

Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2024-10-31 17:06:32 -04:00
8dbecde3ae Added support for detecting assertion characters; changed input so that newline isn't required 2024-10-31 16:51:53 -04:00
a752491563 Added more test cases 2024-10-31 16:47:59 -04:00
656c506aa8 Wrote function to provide correct node for escaped character 2024-10-30 09:33:52 -04:00
1bafdcdb7e Added support for inverted matches; moved escape character detection to its own function 2024-10-30 09:33:25 -04:00
5f4a6c5a3b Added constants for LBRACKET and RBRACKET 2024-10-30 09:32:50 -04:00
e6c607319c Added more tests 2024-10-30 09:32:32 -04:00
8e8e9e133f Fixed matching greediness eg. a(a|b)*a would not match 'aaa' in 'aaab' 2024-10-29 20:07:30 -04:00
a619fd24f6 Added map and reduce functions, and a function to return the difference between two sets 2024-10-29 20:06:09 -04:00
f8ee1b3200 Added more tests 2024-10-29 20:05:42 -04:00
a66e8f1c08 Concatenate every character if it is escaped 2024-10-29 20:05:30 -04:00
d8299294ed Added test cases 2024-10-29 14:41:00 -04:00
45d348e7f4 Updated TODO 2024-10-29 10:08:41 -04:00
7b815343f4 Removed exclamation mark in inverted metacharacters - had the opposite effect becasue of the way deleteFunc works 2024-10-29 10:07:55 -04:00
1a7fd12569 Added support for some escaped metacharacters 2024-10-29 10:05:39 -04:00