04/02/2018

[Java] Find longest word that can be made as subsequence of given word

Here is another interesting problem: given a string and a list of strings, determine, if any, the longest string from the list that can be built as a subsequence of the given string, by only deleting characters in it. For example, given "asd" and {"a", "ccc", "sd", "sda"} return "sd".

The complex part when working with string is as usual the fact that we potentially need to scan the string many time over and over in order to reach our goal.

In this case, we could simply check for each given string in the list, if it can be built from the given input string by looping over that and match each character until we have our answer; this takes O(L) time where L is the length of the INPUT string for a SINGLE word. So overall we would need O(NxL) time where N is the number of strings in the list.

We know we can do some preprocessing to make our runtime better, at the expense of space of course. Since the input string can be very long and our result can start anywhere in there, we could do this in O(N+L) time by simultaneously process ALL words (N) for each character in the input string(L), using an additional O(N) space.

The idea is this: process each string in the list to find which one is the first character we need to match and add an entry to a map indexed by that string and with value the current matched length; then add this map to another map indexed by the character itself. So:

 Map<Character, Map<String, Integer>>  


that represents the group of words from our list that need the same specific character as next from the input string to proceed in the creation. When the matched length is the same as the word, we can remove it from the map forever and verify if it's the current longest one and track it in case.

Note: we could also use a List instead of a map where we keep simply the tuple(String, Integer) but, since we need to alter the content of the List (with O(N) removal operations for each fully matched word), we might take a slight performance hit compared to using the O(1) Map (although remember about the collision possibility) instead.

Then we loop over each character in the input string and we only process the subset of words that need specifically that character to progress. For each of these words we check if we managed to build it completely or which character is the next one we need; in this last case, we increase the matched length and move the word into the proper group for the next character we require.

When we completely parsed the input string, we have our answer.

The only thing that remains to check is: are we completely sure we are not spending any more time parsing any of the string? For example the charAt method which we need to progress on each word, will it parse the full string every time? Turns out, luckily not - it's constant time - YAY! :

 public char charAt(int index) {  
   if ((index < 0) || (index >= value.length)) {  
     throw new StringIndexOutOfBoundsException(index);  
   }  
   return value[index];  
 }  



You can find my implementation of getLongestSubsequence on my Gist alongside some test cases in LongestSubsequenceJTests.

No comments:

Post a Comment

With great power comes great responsibility.

Da grandi poteri derivano grandi responsabilità.