package main
// a matchIndex represents a match. It contains the start index and end index of the match
type matchIndex struct {
startIdx int
endIdx int
}
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
// Returns true if the given matchIndex is an improper subset of any of the indices in the slice.
// When we add an index to our slice, we want to make sure a larger match isn't already present.
func overlaps ( idx matchIndex , idxes [ ] matchIndex ) bool {
for _ , val := range idxes {
if idx . startIdx >= val . startIdx && idx . endIdx <= val . endIdx {
// A zero-length match doesn't overlap if it is located at the start or end
// of the other match
if ! ( idx . startIdx == idx . endIdx && ( idx . startIdx == val . startIdx || idx . startIdx == val . endIdx ) ) {
return true
}
}
}
return false
}
// takeZeroState takes the 0-state (if such a transition exists) for all states in the
// given slice. It returns the resulting states. If any of the resulting states is a 0-state,
// the second parameter is true.
func takeZeroState ( states [ ] * State ) ( rtv [ ] * State , isZero bool ) {
for _ , state := range states {
if len ( state . transitions [ EPSILON ] ) > 0 {
rtv = append ( rtv , state . transitions [ EPSILON ] ... )
}
}
for _ , state := range rtv {
if len ( state . transitions [ EPSILON ] ) > 0 {
return rtv , true
}
}
return rtv , false
}
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
// zeroMatchPossible returns true if a zero-length match is possible
// from any of the given states.
// It uses the same algorithm to find zero-states as the one inside the loop,
// so I should probably put it in a function.
func zeroMatchPossible ( states ... * State ) bool {
zerostates , iszero := takeZeroState ( states )
tempstates := make ( [ ] * State , 0 )
tempstates = append ( tempstates , states ... )
tempstates = append ( tempstates , zerostates ... )
num_appended := 0 // number of unique states addded to tempstates
for iszero == true {
zerostates , iszero = takeZeroState ( tempstates )
tempstates , num_appended = unique_append ( tempstates , zerostates ... )
if num_appended == 0 { // break if we haven't appended any more unique values
break
}
}
for _ , state := range tempstates {
if state . isEmpty && state . assert == NONE && state . isLast {
return true
}
}
return false
}
// findAllMatches tries to findAllMatches the regex represented by given start-state, with
// the given string
func findAllMatches ( start * State , str string ) ( indices [ ] matchIndex ) {
return findAllMatchesHelper ( start , str , make ( [ ] matchIndex , 0 ) , 0 )
}
func findAllMatchesHelper ( start * State , str string , indices [ ] matchIndex , offset int ) [ ] matchIndex {
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
// Base case - exit if offset exceeds string's length
if offset > len ( str ) {
return indices
}
// 'Base case' - if we are at the end of the string, check if we can add a zero-length match
if offset == len ( str ) {
// Get all zero-state matches. If we can get to a zero-state without matching anything, we
// can add a zero-length match. This is all true only if the start state itself matches nothing.
if start . isEmpty && start . assert == NONE {
if zeroMatchPossible ( start ) {
if ! overlaps ( matchIndex { offset , offset } , indices ) {
indices , _ = unique_append ( indices , matchIndex { offset , offset } )
}
}
}
return indices
}
foundPath := false
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
startIdx := offset
endIdx := offset
currentStates := make ( [ ] * State , 0 )
tempStates := make ( [ ] * State , 0 ) // Used to store states that should be used in next loop iteration
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
i := offset // Index in string
startingFrom := i // Store starting index
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
// Increment until we hit a character matching the start state (assuming not 0-state)
if start . isEmpty == false {
for i < len ( str ) && ! start . contentContains ( str , i ) {
i ++
}
startIdx = i
startingFrom = i
i ++ // Advance to next character (if we aren't at a 0-state, which doesn't match anything), so that we can check for transitions. If we advance at a 0-state, we will never get a chance to match the first character
}
currentStates = append ( currentStates , start )
// Hold a list of match indices for the current run. When we
// can no longer find a match, the match with the largest range is
// chosen as the match for the entire string.
// This allows us to pick the longest possible match (which is how greedy matching works).
tempIndices := make ( [ ] matchIndex , 0 )
// Main loop
for i < len ( str ) {
foundPath = false
zeroStates := make ( [ ] * State , 0 )
// Keep taking zero-states, until there are no more left to take
// Objective: If any of our current states have transitions to 0-states, replace them with the 0-state. Do this until there are no more transitions to 0-states, or there are no more unique 0-states to take.
zeroStates , isZero := takeZeroState ( currentStates )
tempStates = append ( tempStates , zeroStates ... )
num_appended := 0
for isZero == true {
zeroStates , isZero = takeZeroState ( tempStates )
tempStates , num_appended = unique_append ( tempStates , zeroStates ... )
if num_appended == 0 { // Break if we haven't appended any more unique values
break
}
}
currentStates , _ = unique_append ( currentStates , tempStates ... )
tempStates = nil
// Take any transitions corresponding to current character
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
numStatesMatched := 0 // The number of states which had at least 1 match for this round
assertionFailed := false // Whether or not an assertion failed for this round
for _ , state := range currentStates {
matches , numMatches := state . matchesFor ( str , i )
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
if numMatches > 0 {
numStatesMatched ++
tempStates = append ( tempStates , matches ... )
foundPath = true
}
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
if numMatches < 0 {
assertionFailed = true
}
if state . isLast {
endIdx = i
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
tempIndices , _ = unique_append ( tempIndices , matchIndex { startIdx , endIdx } )
}
}
if assertionFailed && numStatesMatched == 0 { // Nothing has matched and an assertion has failed - abort
if i == startingFrom {
i ++
}
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
return findAllMatchesHelper ( start , str , indices , i )
}
// Recursion - match with rest of string if we have nowhere to go.
// First check if we can find a zero-length match
if foundPath == false {
if zeroMatchPossible ( currentStates ... ) {
tempIndices , _ = unique_append ( tempIndices , matchIndex { startIdx , startIdx } )
}
// If we haven't moved in the string, increment the counter by 1
// to ensure we don't keep trying the same string over and over.
// if i == startingFrom {
startIdx ++
// i++
// }
// Get the maximum index-range from the list
if len ( tempIndices ) > 0 {
indexToAdd := Reduce ( tempIndices , func ( i1 matchIndex , i2 matchIndex ) matchIndex {
r1 := i1 . endIdx - i1 . startIdx
r2 := i2 . endIdx - i2 . startIdx
if r1 >= r2 {
return i1
}
return i2
} )
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
if ! overlaps ( indexToAdd , indices ) {
indices , _ = unique_append ( indices , indexToAdd )
}
}
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
return findAllMatchesHelper ( start , str , indices , startIdx )
}
currentStates = make ( [ ] * State , len ( tempStates ) )
copy ( currentStates , tempStates )
tempStates = nil
i ++
}
// End-of-string reached. Go to any 0-states, until there are no more 0-states to go to. Then check if any of our states are in the end position.
// This is the exact same algorithm used inside the loop, so I should probably put it in a function.
zeroStates , isZero := takeZeroState ( currentStates )
tempStates = append ( tempStates , zeroStates ... )
num_appended := 0 // Number of unique states addded to tempStates
for isZero == true {
zeroStates , isZero = takeZeroState ( tempStates )
tempStates , num_appended = unique_append ( tempStates , zeroStates ... )
if num_appended == 0 { // Break if we haven't appended any more unique values
break
}
}
currentStates = append ( currentStates , tempStates ... )
tempStates = nil
for _ , state := range currentStates {
// Only add the match if the start index is in bounds. If the state has an assertion,
// make sure the assertion checks out.
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
if state . isLast && startIdx < len ( str ) {
if state . assert == NONE || state . checkAssertion ( str , len ( str ) ) {
endIdx = i
tempIndices , _ = unique_append ( tempIndices , matchIndex { startIdx , endIdx } )
}
}
}
// Get the maximum index-range from the list
if len ( tempIndices ) > 0 {
indexToAdd := Reduce ( tempIndices , func ( i1 matchIndex , i2 matchIndex ) matchIndex {
r1 := i1 . endIdx - i1 . startIdx
r2 := i2 . endIdx - i2 . startIdx
if r1 >= r2 {
return i1
}
return i2
} )
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
if ! overlaps ( indexToAdd , indices ) {
indices , _ = unique_append ( indices , indexToAdd )
}
}
Big rewrite - assertion handling, zero-match fixes, change in recursive calls
I added support for transitions. I wrote a function to determine if
a given state has transitions for a character at a given point in the
string. This helps me check if the current state has an assertion, and
take actions based on that.
I also fixed zero-length matching (almost, see todo.txt). It works for
nearly all cases I could think of, although I still need to write more
tests. I wrote a function to check if zero-length matches are possible
with a given state.
I also changed the way recursive calls work. Rather than passing a
modified string, the function stores the location in the input string.
This location is updated with each call to the function.
Finally, the function now increments the offset by 1 instead of
incrementing by the length of the longest match. This leads to a bit of
overhead eg. if a regex matches index 1-5, then 1-5, 2-5, 3-5, 4-5 are
all stored. To fix this, I wrote (and used) a function to check if
a match overlaps with any matches in a slice.
2 months ago
// Default - call on empty string to get any trailing zero-length matches
return findAllMatchesHelper ( start , str , indices , startIdx + 1 )
}