18/05/2021

[Java] Find all words that can be created from a given phone number in linear time

The previous solution to this problem has a very high cost, the reason for that is the way we cycle on the input. We keep it as a string of digits, and try all combinations  for each one building an hypothetical string as we go.

A better approach which achieves linear complexity instead is to perform additional preprocessing and also some postprocessing:

  • create a static map from character to the digit it has on the same key on the keypad. Since this is static cost is O(1)
  • convert all words to their digit representation. If multiple words have the same representation, collect all of them. This is another map using the digit representation as key and a set of words matching that representation as value. This is in O(sum of length of all words) and we also use a StringBuffer to save on creating too many String objects
  • build the trie using the digit representation as set of matching words. This is O(number of words)
  • run Aho-Corasick searching our phone number as input. This is O(length of phone number)
  • for each matching word we found (which is now its digit representation), retrieve all words that have that same representation and add them to the result set. This is O(number of words) 

This is another case where using a little extra space provides a huge improvement on the overall runtime.

We now have linear complexity, with the worst case being the highest between: sum of length of all words, number of words or length of phone number.

You can check my implementation of findMatchingWordsInPhoneNumberAsDigits on my Gist along with some tests in FindWordsInPhoneNumberAsDigitsJTests.

[Java] Find all words that can be created from a given phone number

While looking at this video, I learned about the Aho-Corasick algorithm which is an interesting way of searching a string for matches against a given set of strings.

Given a set of words and a phone number, we want to scan the whole phone number and trace which words can be constructed from it or portions of it.

Remember that a phone keypad has the following associations:

0 and 1: no characters
2: A, B, C
3: D, E, F
4: G, H, I
5: J, K, L
6: M, N ,O
7: P, Q, R, S
8: T, U, V
9: W, X, Y, Z

Therefore from number 22 we can create the following strings:

AA, AB, AC, BB, BC, BA, CC, CB, CA

Assuming our words are: 

AA, CA, CD

We can then say that for phone number "22" we can create only:

AA, CA

[Java] Aho-Corasick Trie for string search and matching

We worked on Tries in the past and now we take a look at a string search algorithm using Aho-Corasick which gives a nice O(string_length) time AFTER a preprocessing of all the desired search strings.

I found the following videos helpful in understanding the algorithm:

Clearly this approach is most effective when we want to do repeated searches for same patterns in multiple strings.

11/05/2021

[Java] Find minimum number of connections to create fully connected graph from start vertex

In case you were wondering why did we implement the path-based, Tarjan and Kosaraju algorithms for finding strongly connected components, this video quiz is the reason.


We are given a graph representing airline routes and a start city. We need to determine the minimum number of connections to add from the start city so that every other destination is reachable in any number of hops from there.

03/05/2021

[Java] Path based algorithm for finding strongly connected components in a graph

We've already seen both Tarjan's and Kosaraju's algorithms for finding the strongly connected components in a graph. We now take a look at another easy implementation, the path-based algorithm

Supposedly, Dijkstra was the first one to propose a O(V + E) time solution to the problem. I could find a reference to his idea in the book "A discipline of programming" chapter 25 page 192.

[Java] Tarjan's algorithm for finding strongly connected components in a graph

Last time we saw Kosaraju's algorithm for finding strongly connected components in a graph. This time we look at another approach with Tarjan's algorithm.

Both solutions run in O(V + E) time, however to be more precise, Kosaraju's approach runs in 3(V + E): first DFS, edge reversal, second DFS while Tarjan's approach runs in (V+E) only. Additionally, the required edge reversal step in Kosaraju's approach means we would either modify the graph or add an extra structure to track that information.

02/05/2021

[Java] Kosaraju's algorithm for finding strongly connected components in a graph

In a directed graph (not necessarily fully connected), a strongly connected component is a set of vertices whose edges allow each vertex in the set to reach all others.

A single node is always a SCC.

For example in the following graph:


       2             6 ->  7  
       ^ \           ^     /|  
      /   \          |    / |  
      /    \         |   /  |  
     /      v        |  v   v  
     1 <-  3 -> 4 -> 5 ->  8  

We have 4 SCCs:

  • 1,2,3
  • 4,
  • 5,6,7
  • 8

Kosaraju's algorithm gives us a O(V + E) solution provided we are using adjacency lists and our graph is represented by the list of vertices AND the list of edges.

01/05/2021

[Java] Find longest increasing or decreasing subsequence

 As second question from the last exercise, there was this. Given an input such as:


[
   [true, true, true, false],
   [true, false, false, false, false],
   [true, true, true, true, true, true, false, false]

]
 

Where each internal array ALWAYS has at least 2 elements and starts with a "true". Also, after the first "false" there can ONLY be "false" until the end of the array.

We calculate for each of these arrays the percentage of false elements over all elements, call it pct.


We then look for the longest strictly increasing pct sequence.

An empty input list or a list of only one element should return 0, in all other cases, the sequence starts counting AFTER the starting element.