18/05/2021

[Java] Find all words that can be created from a given phone number

While looking at this video, I learned about the Aho-Corasick algorithm which is an interesting way of searching a string for matches against a given set of strings.

Given a set of words and a phone number, we want to scan the whole phone number and trace which words can be constructed from it or portions of it.

Remember that a phone keypad has the following associations:

0 and 1: no characters
2: A, B, C
3: D, E, F
4: G, H, I
5: J, K, L
6: M, N ,O
7: P, Q, R, S
8: T, U, V
9: W, X, Y, Z

Therefore from number 22 we can create the following strings:

AA, AB, AC, BB, BC, BA, CC, CB, CA

Assuming our words are: 

AA, CA, CD

We can then say that for phone number "22" we can create only:

AA, CA

If we simplify the problem and consider a static string as input, then Aho-Corasick gives us a nice O(length_of_string) time to find all possible solutions.

In our case we can still apply this logic, however we need to consider all possible characters that can be associated with a digit in our string.

The logic is therefore similar to what we did in findMatchingStringsInString from the last exercise, but with a couple adjustments:

  • we create a static map to retrieve the set of possible characters associated with each digit
  • when we loop over the phone number we retrieve the possible characters for each digit and loop over them, branching the logic to start with each possible connection from there
  • if we encounter a digit that has no characters associated (eg 1 or 0) we restart from the root of the trie and try to match a new word starting from the empty string
  • whenever we find a valid node, we add all its words to our result set

The complexity here is different than before: building the Aho-Corasick trie stays same in O(sum of length of all words) + O(nodes), which is  O(sum of length of all words) worst case, while the search is now O(possible chars per digit ^ length of string) which in our problem is exactly O(4 ^ length of string) in the worst case, eg when the phone number is composed by only 7s or 9s.

The final result then would be O(4 ^ length of string).


You can find my implementation of findMatchingWordsInPhoneNumber on my Gist, along with some tests in FindWordsInPhoneNumberJTests.



No comments:

Post a Comment

With great power comes great responsibility