11/02/2018

[Java] Word squares tester

This exercise looked scarier than it is, although the real difficulty is not much on the algorithm itself but rather on the correct and clean implementation of it.

A “word square” is an ordered sequence of K different words of length K that, when written one word per line, reads the same horizontally and vertically. For example:

BALL
AREA
LEAD
LADY


Given an arbitrary list of words, return all the possible word squares it contains. Reordering is allowed.

For example, the input list:

[AREA, BALL, DEAR, LADY, LEAD, YARD]


should output:

[(BALL, AREA, LEAD, LADY), (LADY, AREA, DEAR, YARD)]
This is also an exercise where solving one instance by hand is extremely useful and reveals a lot of information regarding how to automatize the process.

When solving by hand, we would pick a word, place it in the square and then search for other candidate words that might fit in the empty spaces; we keep going until we either fill the whole square or not.

For each word we can then create a square and initialize the first row and column to the word itself before placing it in a queue.
We keep iterating over the queue and search for the first empty position along the middle axis (i = j), since this is the first point we must fill using a word that matches the length of the square and whose prefix is before the empty spot we just found.
If we can walk the whole axis without finding empty spots, it means we have a solved square and can therefore take note of it.

We rely on a trie to find candidate words given a suffix and a length and keep updating/generating new squares as long as we have a potential path to a solution.

At the end, we can pick all solved squares, if any, and convert them to list of words, which we can combine again to produce the expected output.

Gotchas:
  • one loop should use an variable declared outside so that we can use the loop break point to perform further processing afterwards
  • the very first time we search for holes in a square, we will start from column 0, not 1, even though we know there's already a character there. This helps condense the looping logic and saves some ifs
  • we need a boolean value to skip processing to the next cycle since we can only break out a single loop at a time
  • when analyzing the possibilities, we must create a NEW square each time, since there might be more candidate words for the same spot and also because of referencing vs hard copy issues
You can check my implementation of findWordSquares on my Gist alongside some test cases in WordSquareJTests and the definition of the Trie class as well.

This is a rough version and I would need to modify it further to place some improvements inside, for example some additional preprocessing steps to remove unmatchable candidates and apply a similar logic later on to limit the squares generation too.

No comments:

Post a Comment

With great power comes great responsibility