In this post, through the wildcard matching problem, I want to highlight the fact that dynamic programming is very intuitive, and once you are familiar with recursion, dynamic programming will add to it a little idea, but this idea improves the performance of programs drastically.
I think one of the most important questions about Dynamic Programming is how do we spot that the problem is a DP problem. Hopefully at the end of this article, you learn how to do that “naturally” (meaning with the flow of ideas in your mind) and not by asking the question whether this is a DP problem or not.
Given an input string (
s) and a pattern (
p), implement wildcard pattern matching with support for
'?'Matches any single character.
'*'Matches any sequence of characters (including the empty sequence).
The matching should cover the entire input string (not partial).
Input: s = "aa", p = "a"
Explanation: "a" does not match the entire string "aa".
Input: s = "aa", p = "*"
Explanation: '*' matches any sequence.
Input: s = "cb", p = "?a"
Explanation: '?' matches 'c', but the second letter is 'a', which does not match 'b'.
Input: s = "adceb", p = "*a*b"
Explanation: The first '*' matches the empty sequence, while the second '*' matches the substring "dce".
Input: s = "acdcb", p = "a*c?b"
In this problem, we have 2 strings S and P. We need to know whether P is a pattern of S.
The idea is pretty straightforward: scan S and P while there is a match between the current character from S and the current character from P. If we reach the end of both strings while there is still a match, we return True, otherwise we return False. The scan is done by having a pointer in S and a pointer in P.
The character ‘c’ of S matches the first character of P if the first character of P is:
When the first character from P is a lowercase letter different from ‘c’, we return False.
If the first character of P is ‘c’ or ‘?’, we move both pointers one step to the right.
If the first character of P is ‘*’, we have 2 possibilities:
- ‘*’ matches 0 character from S: in this case we move the pointer in P one step
- ‘*’ matches 1 or more characters: in this case we move the pointer in S one step
And we continue like this for each two positions taken by the two pointers.
If we reach the end of P but there is still characters from S, we simply return .. False !
If we reach the end of S and there is still characters from P, the only case when there is a match is that all the remaining characters in P are ‘*’, in this case these stars will be matched with the empty string.
Here is the code:
The problem with the solution above is that there are many repeated computations. Add print(start_s, start_p) at the beginning of the helper function and you will see that there are many calls to helper with the same value of the couple (start_s, start_p).
So the solution to this is simply memoize (=cache) the work we have done already. Here is the code with caching:
I removed the returns and replaced them with assignment to res so that I don’t repeat the statement self.memo[(start_s, start_p)] = res in each if statement.
This kind of solution is called Top-Down DP, because we started from the biggest problem (the whole S and the whole P) to smaller problems.
A top-down DP solution (recursive) may be converted to a Bottom-up solution (iterative) that goes from smaller to larger problems.
A top-down solution doesn’t compute a sub-result before needing it. Bottom-up knows that it will need this subresult so it computes it before moving to the next level.
And because the cache keys in the top-down version were couples (i, j) with i between 0 and N and j between 0 and M. We usually work with a matrix instead of a hashtable in the bottom-up version (we may have done it in the top-down version as well). Hashtable is more costly in this case due to key collisions during search, and memory waste to ensure its O(1) average time complexity of search.
Here is the bottom-up version:
I kept the code a little bit long so that you can do the mappings between the Top-Down version and the bottom-up version. They do exactly the same thing !
Hope this helps someone change the way he sees dynamic programming as an iterative tabular modification that is hard to think about to a small change in a recursive brute force solution.