Pages

05/03/2018

[Java] Find if two strings are interleaved

A string is the result of interleaving two other strings if it contains ALL characters from both strings in the same order as they appear in the source strings. When determining which character to place in the interleaved string, it is possible to choose each time 0+ characters from either string!

Eg:
ABACCA is the interleaving of ABC and ACA -> AB, A, C, CA
ABACAC is also the interleaving of ABC and ACA -> AB, AC, A, C
ABAACC is NOT the interleaving of ABC and ACA -> AB, A, we expect now a C but find an A instead

The 0+ choice makes things a bit hard, contrarily to the O(N) case where we could pick only 1 character from each string at a time.
It means we might create the result string in more than one way and need therefore to examine all the possibilities before returning our verdict.


The idea is to consider the following base cases:
  • all null or empty strings are interleaved
  • if one source string is null, other two must match
  • if length of source does not add up to target, for sure there is no interleaving
Then, for each character in the string to test, we check whether it matches the first remaining character in either source string, and if so, repeat the process considering the remainder of all strings.

Here is a partial decision tree for ABC, ACA, ABACCA: start trying to match A and see that it matches both source strings, therefore we move on to investigate if BC, ACA, BACCA or ABC, CA, BACCA produce a match.
For BC, ACA, BACCA, we see that B matches one of the strings so we move on to test C, ACA, ACCA.
For C, ACA, ACCA, A matches the second string, so now check C, CA, CCA.
For C, CA, CCA, C matches both strings so verify if null, CA, CA or C, A, CA produces a match.
null, CA, CA is a base case and we identify it as true since CA = CA, we can return up the recursion tree.

We keep going until we reached the end of the target string.

This will quickly get out of hand if the strings overlap in multiple places, for example input AAAA, AAAA, AAAAAAAA will result in 139 function calls.

A better solution therefore is to track for both source strings, if we already found a match for the specific combination of characters left. This way, the previous example goes to only 33 function calls.

Adding a return true in case we find a match on the first string, therefore avoiding altogether to examine the remainder of the second one, brings that total further down to only 5 calls; of course it won't help in case the second string matches first..

Then we might also think of doing it in parallel, but maybe we should also go rest :)

By the way, I know for a fact that it's possible to solve this using a single 2D matrix and nested loops, but that solution did pop into my mind within the target time so.. keep it for next time

I am still wondering about the exact complexity in both cases; for sure the second one will use extra O(MxN) space where M and N is the length of the two source strings.

You can check my implementation of both findInterleaveNoDP and findInterleave on my Gist alongside some test cases for both in InterleavingJTests.

No comments:

Post a Comment

With great power comes great responsibility