Given a simple regex that allows characters, dots and stars, find if a given text matches a given pattern.
dot '.' = any character
star '*' = 0 or more repetitions of previous character
We have already seen a slightly more complex regular expression matcher which does a similar thing but I realized that matcher did NOT use caching.
The complexity becomes exponential due to the behavior of star, since "0 or more" means we have a choice: match on this char and continue with same pattern or keep this char and continue with next piece of pattern.
For example:
text = abbbb
pattern = ab*b
Due to this possible choice, our complexity is O(2^N) for both time and space.
We can cache however the pairs (index of text char, index of pattern char) where failed to find a match, so that for repeated patterns example:
text = abbbb
pattern = ab*b*b*.*.*b
When we reach a portion that we already investigated in the past, we skip it and continue
This uses O(N^2) space and O(N^2) time as we check each pair once.
We have to be very careful in our base cases:
text = "", pattern = "" is a match
text = "", pattern = ".*" or any char followed by star or any sequence of chars followed by stars (.*b*n*), is a match
text = "", pattern = "." or any sequence, is not a match
text = anything, pattern = "", is not a match
We therefore find the following cases:
- pattern is not a star: try matching with text and continue if successful
- pattern is a star: branch options (1) match on this char and continue on next char with same pattern or (2) stay on this char and continue on next char in pattern
If we encounter a failure when doing any of these inspections, we cache it
If at any point we can make a successful match, we stop and return true.
You can check my implementation of regexMatch on my Gist along with some tests in RegexMatchJTests.
No comments:
Post a Comment
With great power comes great responsibility