06/06/2021

[Java] Serialize and deserialize a binary tree to a string

Given a binary tree, we need to produce a string representation of it which we can later use to reconstruct the same tree.

The difficulty here is in handling the information correctly, for example a tree with holes (eg not all levels are full) MUST communicate this in our serialize data, otherwise we can't rebuild it correctly.

Now we have seen in the past it's possible to reconstruct a binary tree from its inorder and postorder visits, therefore we could simply do those and serialize them in our string.

Another approach is instead to do a BFS level visit.

Serialize:

Walk the tree and track all values found, including nulls AND children of nulls.

We would then have a "full" tree written down, with placeholders for the places that had no node. Since we are doing a BFS and we are putting nulls in the queue, we need to know when to stop, for that we track at which level of the tree we are and how many nulls have we added for the NEXT level. If the NEXT level is full of nulls, we can stop. Remember that a level can hold maximum 2^level nodes.

We use special markers in our string to delimit values and nulls. This is an interesting point of discussion for example if our tree contains strings as values instead of integers.

Using a StringBuffer greatly helps here to avoid creating too many string objects.

Serialization works in O(N) time and space, where N is the number of nodes in the tree.

Deserialize:

To deserialize instead, we immediately attempt to deserialize the root, so we have the first node to add to our queue, then start another BFS and keep track of our current position on the given string.

For each node we poll from the queue, its children (nulls or otherwise) will be EXACTLY in the current position we are in the string, since that is the logic we applied to create the string in the first place. 

To extract a value from the string, we look for the value separator character, then execute a substring from the current position we are until that character.

When we find a null value marker, we do not add that particular child to the current node, but we still add it to the queue, this will make us move along the input string correctly as if the tree was full.

After we have successfully read a value, we need to offset our position in the string by the length of the substring we just read to extract that value.

When we have reached the end of the string, our reconstruction is complete.

If at any point we find an invalid value, eg two consecutive value delimiters in our string, or a value which is not an int, we raise an error. 

The complexity here is O(M) and O(N) space where M is the length of our string and N is the number of nodes in the tree.

If during the value parsing we had used both indexOf and substring, we would walk on the exact same portion of the string twice. Better to done one single pass and track the read characters until the separator ourselves instead.

The same problem applies to the string to integer with parseInt, where we definitely read the same string portion again after having just read it to extract it.

No comments:

Post a Comment

With great power comes great responsibility