115 Commits (992c5a9300ca605ae447b019b1e210ccc46b7333)
 

Author SHA1 Message Date
Aadhavan Srinivasan 992c5a9300 Replaced isAlphaNum() with isNormalChar(), which returns true if the character isn't special (also returns true for unicode characters, which the previous function didn't 1 month ago
Aadhavan Srinivasan 1e0502c6aa Added unicode tests 1 month ago
Aadhavan Srinivasan c56d81a335 Added unicode support to dot metacharacter - it now matches _any_ unicode character (almost) 1 month ago
Aadhavan Srinivasan 8a1f1dc621 Added unicode support
Replaced strings with rune-slices, which capture unicode codepoints more
accurately.
1 month ago
Aadhavan Srinivasan 805766a5ba Added support for -l : only print lines with at least one match (or with exactly 0 matches, if -v is enabled 1 month ago
Aadhavan Srinivasan dcd712dceb Added support for -o flag: only print matching content 1 month ago
Aadhavan Srinivasan f2b8812b05 Added support for -v flag, to invert which values are printed in color. Also got rid of unecessary 'else' clause 1 month ago
Aadhavan Srinivasan 11641596fa Read multiple lines from stdin and apply regex to each one; Convert the array of matchIndex structs into a flat array of indices; speeds up process of checking if we have to print a character in color 1 month ago
Aadhavan Srinivasan b55b80ec6c Updated TODO
I didn't like the existing capturing group implementation, so I moved
that to a separate branch. This branch does not (at the moment) any code
relating to capturing groups.
1 month ago
Aadhavan Srinivasan 137ea3c746 Made findAllMatchesHelper non-recursive, added pruneIndices (improved performance) and more changes
I made findAllMatchesHelper a non-recursive function. It now only
returns the first match it finds in the string (so I should probably
rename it).

These indices are collected by findAllMatches and pruned (to
remove overlaps). The overlap function has also been rewritten, to make
it (I believe) less than O(n^2). I also used the uniq_arr type to make
checking for uniqueness O(1) instaed of O(n) (as it was with
unique_append()). This has resulted in massive performance gains.

There's been a lot of changes here, and I probably haven't documented
all of them.
2 months ago
Aadhavan Srinivasan 9201ed49bd Changed type from matchIndex to MatchIndex 2 months ago
Aadhavan Srinivasan 9a073aa514 Added node types for left and right parentheses 2 months ago
Aadhavan Srinivasan 7d265495f5 Got rid of list for uniq_arr (O(n) deletion) and instead have separate method to create list (O(n) list creation) 2 months ago
Aadhavan Srinivasan e2e99ff6a9 Added fnunction to generate numbers in a range; added capacity to some slices to prevent unnecessary reallocations 2 months ago
Aadhavan Srinivasan 8a69ea8cb7 Added unique array data structure - O(1) addition and retrieval (I think) 2 months ago
Aadhavan Srinivasan ea17251bf8 Might have made a change to improve performance 2 months ago
Aadhavan Srinivasan e8aca8606a Added test cases 2 months ago
Aadhavan Srinivasan 9698c4f1d8 Fixed error in calculating word boundary (off-by-one) 2 months ago
Aadhavan Srinivasan c032dcb2ea Added more test cases 2 months ago
Aadhavan Srinivasan 269e2d0e1c Updated go.mod 2 months ago
Aadhavan Srinivasan 21142e6e13 Wrote function to clone the NFA starting at a given state, and a function to find question mark operator (a? == (a|)) 2 months ago
Aadhavan Srinivasan b602295bee Added support for specifying how often a postfixNode is repeated 2 months ago
Aadhavan Srinivasan 1d9d1a5b81 Fixed calculation of overlapping (used to check for subset instead) 2 months ago
Aadhavan Srinivasan d8f52b8ccc Added support for numeric specifiers, moved question mark operator to its own function 2 months ago
Aadhavan Srinivasan dca81c1796 Replaced rune-slice parameters with string parameters in functions; avoids unnecessary conversion from strings to rune-slices 2 months ago
Aadhavan Srinivasan 723be527fb Updated TODO 2 months ago
Aadhavan Srinivasan fccd3a76f5 Wrote function to check if the assertion of a state is true 2 months ago
Aadhavan Srinivasan 315f68df12 Fixed typo 2 months ago
Aadhavan Srinivasan fd957d9518 Added more test cases 2 months ago
Aadhavan Srinivasan 19dc5064c8 Made conditions for word boundary a little more relaxed 2 months ago
Aadhavan Srinivasan a19d409796 Set node type to ASSERTION if the character represents an assertion 2 months ago
Aadhavan Srinivasan 0736e813c1 Fixed boneheaded mistake with checking assertion types 2 months ago
Aadhavan Srinivasan 1aff6e2fa4 Added a field to State, that tells me what kind of assertion (if any) it is making. Also added function to check if a state's contents contain a given value (checks assertions), and to find all matches that a state has for a character 2 months ago
Aadhavan Srinivasan f3bf5e9740 Added function to check for word boundaries and delete an element from a slice 2 months ago
Aadhavan Srinivasan 20db62c596 Got rid of function that I don't need anymore 2 months ago
Aadhavan Srinivasan 360bdc8e11 Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.

I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.

I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.

Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
Aadhavan Srinivasan 8dbecde3ae Added support for detecting assertion characters; changed input so that newline isn't required 2 months ago
Aadhavan Srinivasan a752491563 Added more test cases 2 months ago
Aadhavan Srinivasan 656c506aa8 Wrote function to provide correct node for escaped character 2 months ago
Aadhavan Srinivasan 1bafdcdb7e Added support for inverted matches; moved escape character detection to its own function 2 months ago
Aadhavan Srinivasan 5f4a6c5a3b Added constants for LBRACKET and RBRACKET 2 months ago
Aadhavan Srinivasan e6c607319c Added more tests 2 months ago
Aadhavan Srinivasan 8e8e9e133f Fixed matching greediness eg. a(a|b)*a would not match 'aaa' in 'aaab' 2 months ago
Aadhavan Srinivasan a619fd24f6 Added map and reduce functions, and a function to return the difference between two sets 2 months ago
Aadhavan Srinivasan f8ee1b3200 Added more tests 2 months ago
Aadhavan Srinivasan a66e8f1c08 Concatenate every character if it is escaped 2 months ago
Aadhavan Srinivasan d8299294ed Added test cases 2 months ago
Aadhavan Srinivasan 45d348e7f4 Updated TODO 2 months ago
Aadhavan Srinivasan 7b815343f4 Removed exclamation mark in inverted metacharacters - had the opposite effect becasue of the way deleteFunc works 2 months ago
Aadhavan Srinivasan 1a7fd12569 Added support for some escaped metacharacters 2 months ago