Aadhavan Srinivasan
1e0502c6aa
Added unicode tests
1 month ago
Aadhavan Srinivasan
c56d81a335
Added unicode support to dot metacharacter - it now matches _any_ unicode character (almost)
1 month ago
Aadhavan Srinivasan
8a1f1dc621
Added unicode support
...
Replaced strings with rune-slices, which capture unicode codepoints more
accurately.
1 month ago
Aadhavan Srinivasan
805766a5ba
Added support for -l : only print lines with at least one match (or with exactly 0 matches, if -v is enabled
1 month ago
Aadhavan Srinivasan
dcd712dceb
Added support for -o flag: only print matching content
1 month ago
Aadhavan Srinivasan
f2b8812b05
Added support for -v flag, to invert which values are printed in color. Also got rid of unecessary 'else' clause
1 month ago
Aadhavan Srinivasan
11641596fa
Read multiple lines from stdin and apply regex to each one; Convert the array of matchIndex structs into a flat array of indices; speeds up process of checking if we have to print a character in color
1 month ago
Aadhavan Srinivasan
b55b80ec6c
Updated TODO
...
I didn't like the existing capturing group implementation, so I moved
that to a separate branch. This branch does not (at the moment) any code
relating to capturing groups.
1 month ago
Aadhavan Srinivasan
137ea3c746
Made findAllMatchesHelper non-recursive, added pruneIndices (improved performance) and more changes
...
I made findAllMatchesHelper a non-recursive function. It now only
returns the first match it finds in the string (so I should probably
rename it).
These indices are collected by findAllMatches and pruned (to
remove overlaps). The overlap function has also been rewritten, to make
it (I believe) less than O(n^2). I also used the uniq_arr type to make
checking for uniqueness O(1) instaed of O(n) (as it was with
unique_append()). This has resulted in massive performance gains.
There's been a lot of changes here, and I probably haven't documented
all of them.
2 months ago
Aadhavan Srinivasan
9201ed49bd
Changed type from matchIndex to MatchIndex
2 months ago
Aadhavan Srinivasan
9a073aa514
Added node types for left and right parentheses
2 months ago
Aadhavan Srinivasan
7d265495f5
Got rid of list for uniq_arr (O(n) deletion) and instead have separate method to create list (O(n) list creation)
2 months ago
Aadhavan Srinivasan
e2e99ff6a9
Added fnunction to generate numbers in a range; added capacity to some slices to prevent unnecessary reallocations
2 months ago
Aadhavan Srinivasan
8a69ea8cb7
Added unique array data structure - O(1) addition and retrieval (I think)
2 months ago
Aadhavan Srinivasan
ea17251bf8
Might have made a change to improve performance
2 months ago
Aadhavan Srinivasan
e8aca8606a
Added test cases
2 months ago
Aadhavan Srinivasan
9698c4f1d8
Fixed error in calculating word boundary (off-by-one)
2 months ago
Aadhavan Srinivasan
c032dcb2ea
Added more test cases
2 months ago
Aadhavan Srinivasan
269e2d0e1c
Updated go.mod
2 months ago
Aadhavan Srinivasan
21142e6e13
Wrote function to clone the NFA starting at a given state, and a function to find question mark operator (a? == (a|))
2 months ago
Aadhavan Srinivasan
b602295bee
Added support for specifying how often a postfixNode is repeated
2 months ago
Aadhavan Srinivasan
1d9d1a5b81
Fixed calculation of overlapping (used to check for subset instead)
2 months ago
Aadhavan Srinivasan
d8f52b8ccc
Added support for numeric specifiers, moved question mark operator to its own function
2 months ago
Aadhavan Srinivasan
dca81c1796
Replaced rune-slice parameters with string parameters in functions; avoids unnecessary conversion from strings to rune-slices
2 months ago
Aadhavan Srinivasan
723be527fb
Updated TODO
2 months ago
Aadhavan Srinivasan
fccd3a76f5
Wrote function to check if the assertion of a state is true
2 months ago
Aadhavan Srinivasan
315f68df12
Fixed typo
2 months ago
Aadhavan Srinivasan
fd957d9518
Added more test cases
2 months ago
Aadhavan Srinivasan
19dc5064c8
Made conditions for word boundary a little more relaxed
2 months ago
Aadhavan Srinivasan
a19d409796
Set node type to ASSERTION if the character represents an assertion
2 months ago
Aadhavan Srinivasan
0736e813c1
Fixed boneheaded mistake with checking assertion types
2 months ago
Aadhavan Srinivasan
1aff6e2fa4
Added a field to State, that tells me what kind of assertion (if any) it is making. Also added function to check if a state's contents contain a given value (checks assertions), and to find all matches that a state has for a character
2 months ago
Aadhavan Srinivasan
f3bf5e9740
Added function to check for word boundaries and delete an element from a slice
2 months ago
Aadhavan Srinivasan
20db62c596
Got rid of function that I don't need anymore
2 months ago
Aadhavan Srinivasan
360bdc8e11
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
...
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
Aadhavan Srinivasan
8dbecde3ae
Added support for detecting assertion characters; changed input so that newline isn't required
2 months ago
Aadhavan Srinivasan
a752491563
Added more test cases
2 months ago
Aadhavan Srinivasan
656c506aa8
Wrote function to provide correct node for escaped character
2 months ago
Aadhavan Srinivasan
1bafdcdb7e
Added support for inverted matches; moved escape character detection to its own function
2 months ago
Aadhavan Srinivasan
5f4a6c5a3b
Added constants for LBRACKET and RBRACKET
2 months ago
Aadhavan Srinivasan
e6c607319c
Added more tests
2 months ago
Aadhavan Srinivasan
8e8e9e133f
Fixed matching greediness eg. a(a|b)*a would not match 'aaa' in 'aaab'
2 months ago
Aadhavan Srinivasan
a619fd24f6
Added map and reduce functions, and a function to return the difference between two sets
2 months ago
Aadhavan Srinivasan
f8ee1b3200
Added more tests
2 months ago
Aadhavan Srinivasan
a66e8f1c08
Concatenate every character if it is escaped
2 months ago
Aadhavan Srinivasan
d8299294ed
Added test cases
2 months ago
Aadhavan Srinivasan
45d348e7f4
Updated TODO
2 months ago
Aadhavan Srinivasan
7b815343f4
Removed exclamation mark in inverted metacharacters - had the opposite effect becasue of the way deleteFunc works
2 months ago
Aadhavan Srinivasan
1a7fd12569
Added support for some escaped metacharacters
2 months ago
Aadhavan Srinivasan
b8d5ea0897
Wrote function to create a character node regardless of the contents of the node
2 months ago