16/06/2021

[Java] Count ways a message can be decoded and interpreted

Assume we take a plaintext string composed only of alphabetic characters and encode it substituting to each character a value: a = 1, ... , z = 26

Given an encoded string, determine in how many different ways can the message be interpreted. For example

10: only way is digit 10

11: can be digit 11 or digits 1, 1

This can either be done online (reading string left to right) or from a static string (right to left).

The idea is we walk the string and verify at each step that we have a valid sequence, if not we raise error. Invalid sequences are:

  • any non digit
  • 00
  • x0: with x > 2

For each character we look at the following one to determine if there is an increment in the number of ways we can decode the string, given two digits we have the following valid combinations:

  • 10: only one way, the number 10
  • 1x: with x > 0 and x <= 9, two ways, single digits (1 and x) or number (1x)
  • 20: only one way, the number 20
  • 2x: with x > 0 and x <= 6, two ways, single digits (2 and x) or number (2x)
  • xy: with x > 2 and y > 0 and y <= 9, one way, the single digits (x and y)

CAUTION: anytime we encounter a 10 or 20, we must track it as the PREVIOUS digit CANNOT use the 1 or 2 to count as double decoding since we must use that in pair with the previous zero. Example:

  • 110 - only 1 way: 1, 10
  • 1620 - two ways: 16, 20 - 1, 6, 20

Therefore we track finding a zero and lock the previous digit to it, we remove the flag only after we successfully process the previous digit (unless it's again a 0 or 10 or 20)

If we represent this as a graph, each "two ways" possibility represents a branching path, with each subsequent digit updating ALL branching paths, the number of leaves in the end is our result.
This updating ALL branching paths is expensive, in the order of N

We can see that when we look at a new character, we are interested in checking the following one only to determine in which case we are, either branching or not, therefore we can represent this as a rolling sum of possibilities up to two digits before and up to the previous digit.

If the new digit does NOT contribute to a branch, our sum is unaltered, we update our values and continue.
If the new digit DOES contribute to a branch, our sum is the combination of possibilities from the branches we would create, for example:

10 - array of possibilities per each position from last: [1,0]

0
 \
 1,0


we see the 0 which does not contribute to a branch, possibilities are 1
we see the 1 which together with the 0 does not contribute to a branch, possibilities are still 1

11 - array of possibilities per each position from last: [2,1]

    1
  /   \
 11   1,1



we see the 1 which does not contribute to a branch, possibilities are 1
we see the 1 which together with the other 1 does contribute to a branch, possibilities are now 2

111 - array of possibilities per each position from last: [3,2,1]

      1
    /    \
  11      1,1
  /      /     \
11,1   1,1,1  1,11


the new 1 does branch again adding other possibilities to the previous expandable level, total possibilities are 3

We also notice that each new digit expands from the last two digits only, therefore we do not need to remember the whole array, but only the last two values and update them correctly at each iteration.

With this approach we have a O(N) time and O(1) space solution.

You can check my implementation of countWaysToDecode on my Gist along with some tests in CountWaysToDecodeJTests.

No comments:

Post a Comment

With great power comes great responsibility