Commit Graph

112 Commits (8a1f1dc621e7f4228ebf60be0571f08551f0398e)
 

Author SHA1 Message Date
Aadhavan Srinivasan 8a1f1dc621 Added unicode support
Replaced strings with rune-slices, which capture unicode codepoints more
accurately.
Aadhavan Srinivasan 805766a5ba Added support for -l : only print lines with at least one match (or with exactly 0 matches, if -v is enabled
Aadhavan Srinivasan dcd712dceb Added support for -o flag: only print matching content
Aadhavan Srinivasan f2b8812b05 Added support for -v flag, to invert which values are printed in color. Also got rid of unecessary 'else' clause
Aadhavan Srinivasan 11641596fa Read multiple lines from stdin and apply regex to each one; Convert the array of matchIndex structs into a flat array of indices; speeds up process of checking if we have to print a character in color
Aadhavan Srinivasan b55b80ec6c Updated TODO
I didn't like the existing capturing group implementation, so I moved
that to a separate branch. This branch does not (at the moment) any code
relating to capturing groups.
Aadhavan Srinivasan 137ea3c746 Made findAllMatchesHelper non-recursive, added pruneIndices (improved performance) and more changes
I made findAllMatchesHelper a non-recursive function. It now only
returns the first match it finds in the string (so I should probably
rename it).

These indices are collected by findAllMatches and pruned (to
remove overlaps). The overlap function has also been rewritten, to make
it (I believe) less than O(n^2). I also used the uniq_arr type to make
checking for uniqueness O(1) instaed of O(n) (as it was with
unique_append()). This has resulted in massive performance gains.

There's been a lot of changes here, and I probably haven't documented
all of them.
Aadhavan Srinivasan 9201ed49bd Changed type from matchIndex to MatchIndex
Aadhavan Srinivasan 9a073aa514 Added node types for left and right parentheses
Aadhavan Srinivasan 7d265495f5 Got rid of list for uniq_arr (O(n) deletion) and instead have separate method to create list (O(n) list creation)
Aadhavan Srinivasan e2e99ff6a9 Added fnunction to generate numbers in a range; added capacity to some slices to prevent unnecessary reallocations
Aadhavan Srinivasan 8a69ea8cb7 Added unique array data structure - O(1) addition and retrieval (I think)
Aadhavan Srinivasan ea17251bf8 Might have made a change to improve performance
Aadhavan Srinivasan e8aca8606a Added test cases
Aadhavan Srinivasan 9698c4f1d8 Fixed error in calculating word boundary (off-by-one)
Aadhavan Srinivasan c032dcb2ea Added more test cases
Aadhavan Srinivasan 269e2d0e1c Updated go.mod
Aadhavan Srinivasan 21142e6e13 Wrote function to clone the NFA starting at a given state, and a function to find question mark operator (a? == (a|))
Aadhavan Srinivasan b602295bee Added support for specifying how often a postfixNode is repeated
Aadhavan Srinivasan 1d9d1a5b81 Fixed calculation of overlapping (used to check for subset instead)
Aadhavan Srinivasan d8f52b8ccc Added support for numeric specifiers, moved question mark operator to its own function
Aadhavan Srinivasan dca81c1796 Replaced rune-slice parameters with string parameters in functions; avoids unnecessary conversion from strings to rune-slices
Aadhavan Srinivasan 723be527fb Updated TODO
Aadhavan Srinivasan fccd3a76f5 Wrote function to check if the assertion of a state is true
Aadhavan Srinivasan 315f68df12 Fixed typo
Aadhavan Srinivasan fd957d9518 Added more test cases
Aadhavan Srinivasan 19dc5064c8 Made conditions for word boundary a little more relaxed
Aadhavan Srinivasan a19d409796 Set node type to ASSERTION if the character represents an assertion
Aadhavan Srinivasan 0736e813c1 Fixed boneheaded mistake with checking assertion types
Aadhavan Srinivasan 1aff6e2fa4 Added a field to State, that tells me what kind of assertion (if any) it is making. Also added function to check if a state's contents contain a given value (checks assertions), and to find all matches that a state has for a character
Aadhavan Srinivasan f3bf5e9740 Added function to check for word boundaries and delete an element from a slice
Aadhavan Srinivasan 20db62c596 Got rid of function that I don't need anymore
Aadhavan Srinivasan 360bdc8e11 Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.

I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.

I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.

Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
Aadhavan Srinivasan 8dbecde3ae Added support for detecting assertion characters; changed input so that newline isn't required
Aadhavan Srinivasan a752491563 Added more test cases
Aadhavan Srinivasan 656c506aa8 Wrote function to provide correct node for escaped character
Aadhavan Srinivasan 1bafdcdb7e Added support for inverted matches; moved escape character detection to its own function
Aadhavan Srinivasan 5f4a6c5a3b Added constants for LBRACKET and RBRACKET
Aadhavan Srinivasan e6c607319c Added more tests
Aadhavan Srinivasan 8e8e9e133f Fixed matching greediness eg. a(a|b)*a would not match 'aaa' in 'aaab'
Aadhavan Srinivasan a619fd24f6 Added map and reduce functions, and a function to return the difference between two sets
Aadhavan Srinivasan f8ee1b3200 Added more tests
Aadhavan Srinivasan a66e8f1c08 Concatenate every character if it is escaped
Aadhavan Srinivasan d8299294ed Added test cases
Aadhavan Srinivasan 45d348e7f4 Updated TODO
Aadhavan Srinivasan 7b815343f4 Removed exclamation mark in inverted metacharacters - had the opposite effect becasue of the way deleteFunc works
Aadhavan Srinivasan 1a7fd12569 Added support for some escaped metacharacters
Aadhavan Srinivasan b8d5ea0897 Wrote function to create a character node regardless of the contents of the node
Aadhavan Srinivasan 445a7247f8 Defined variables to provide ranges of characters for metacharacters
Aadhavan Srinivasan ca945c7740 Added support for character ranges and dot metacharacter