Pages

04/07/2021

[Java] String length compression decode

Given a text representing a compressed string, return the uncompressed version. 

Compression works as such: if a substring is repeated K times, we represent as K[substring] which can be nested. Example:

K[ssL[a]] means we repeat 'a' L times (call it T) and then repeat 'ssT' K times.

K is always > 0 and the string will always be well formed.

We can use recursion or an explicit stack. We notice that until we find a digit, we are simply building a partial string. When we find a digit, we keep the part read so far and append to it the result of parsing the K[...] portion after us.

We repeat this until we have reached the end of the input.

Using a single stack, we can keep pushing portions to it while we parse the input and when we encounter a closed bracket, we pop from it until we reach the digit.

We can then reconstruct the repeated portion, which we add back to the stack.

At the end, the stack will contain our string but the pieces are in reverse order, so we reverse the stack and have our answer.

This however implies performing multiple operations on the single stack. If instead we use two stacks, we can write simpler code and achieve better complexity since we do not need to iterate on the stack to retrieve all pieces we have read until that point.

We can walk on each character c of the string, and have the following scenarios:

  • c is character, add it to current string we are building and continue
  • c is digit, gather all digits composing the number and store the number in a stack. The string we had up to this point will appear in the output BEFORE the string that will be repeated K times, save that in a stack

Example abc2[d] means abc will appear BEFORE dd which we will be able to build only when we reach the matching closed bracket.

  • c is '[', we handle this case as part of the previous one as a well formed string has format number[string] so we stop reading digits when we encounter the open bracket
  • c is ']', the string we had up to this point must be repeated K times, based on the last number we read. Pop the number from the numbers stack and the string to place BEFORE repeating the current one from the other stack. Then append to the string BEFORE the current one K times the current string. The result is the new current string so far, which is our solution up to this point

When we reach end of the input string, we have the full reconstruction in the StringBuilder.

This runs in O(N) time where N is the size of the original string. We can see this since we iterate over the input string once looking at each character, in some cases we need to do an inner loop to repeat a portion of the string, which is the only time we are doing more work for a given position in the input. Differently from the other case then, we only pop once from the stacks so the only cost is the repetition of the string.

Space complexity is equal to the deepest nesting of segments we have in our compressed string. A segment is a portion of string within brackets.

Given that we store partial strings until we find the last matching bracket of nested segments, at most in our stack we store that many string portions.

The output itself might be bigger than the input, however we need to construct that, therefore it's not counted in the space complexity. To give an upper bound to space therefore we can say it's O(N) as well, as we couldn't possibly push more characters that there are in the result string.

You can check my implementation of decodeString on my Gist along with some tests in DecodeStringJTests.

No comments:

Post a Comment

With great power comes great responsibility