Pages

02/07/2021

[Java] Catalan numbers - Number of unique binary trees with N nodes

Here is another quite difficult problem: given an integer N, find how many valid binary search trees can be built using N nodes.

Being a problem on trees, we can try approaching it with recursion, which means we need to find some formula and base case to generalize the structure.

Drawing this problem out definitely helps reaching a solution, but the number of results grows quite rapidly even for small values of N so honestly spotting a pattern early was not super easy, N = 4 is the first where I think it's possible to see the recurrence.

Base case(s) are quite easy, a tree with 0 and 1 nodes can only be represented in one way: no nodes, or the single root.

Then we need to generalize, for a given root and at least another node, the root is one node and all remaining N-1 nodes can either be:

- all on the left side (keys are smaller than root)

- all on the right side (keys are bigger than root)

- some on the left and some on the right

The last case actually covers the previous two as well, we can simply see it as zero nodes being on the other side.

If some nodes are left and some are right and we are the root, it means some nodes in range 1..root-1 are on the left and nodes in range root+1..N are on the right, if we view all nodes as the numbers in the range 1..N and the root is value X, we will have the full sequence 1..X..N for a single value X.

Therefore for a given root, the number of ways its children can be combined is:

number of ways left subtree can be built * number of ways for right subtree can be built

If we repeat this process for all N possible roots, we get our final answer. Definitely using a cache here helps as the number of combinations rises terribly fast.

I am not sure if having the problem described as counting binary search trees rather than simply binary trees would help of complicate finding a solution, in any case, the answer is same for both.

It also turns out there is a sequence called Catalan numbers which represents exactly this pattern and is a special case of combinations. However, the n choose k formula involves factorials and for even small inputs, those can easily overflow the space of an integer. We can still use those numbers to test our solution though ;)

Both the top down and bottom up cached approaches will use O(N) space and O(N^2) time. As usual this is easier to see in the bottom up approach where we must explicitly run the inner loop to generate all 1..X..N combinations for the value of N being considered at that moment.

You can check my implementation of numOfBSTs and numOfBSTsBottomUp on my Gist along with some tests in NumOfBSTsJTests.

No comments:

Post a Comment

With great power comes great responsibility