[Java] Self balancing AVL binary tree

Oh. my. god.

This has definitely been the hardest one so far; I remembered the traumatic experience since the university time and also remembered why I hid it deep down.

You can check the glorious Wikipedia for the full description of the AVL trees, no need for me to repeat it there. I will however list some of the key points I found while implementing it:
  • draw the thing down. It is the best help you can get while your brain tries to make sense of this magic rocket science
  • TRACK THE HEIGHT, COMPUTE THE BALANCE. This kept me stuck for eons, maybe it's possible to track the balance only as well and perform some more complex update logic, but I have no intention of going down the rabbit hole again for the moment :)
  • after inserting a node, backtrack to the root and perform rotations as needed. Update the heights while backtracking
  • not everyone defines right and left rotations the same way. Sometimes they are inverted in the directions they will actually rotate; end result in balancing terms is the same but naming might cause confusion. For me right rotate means: take the node, pull it down right and set his left child as parent. So for me: balance -2 means left heavy therefore a right (or right-left) rotation is needed
You can find my implementation on my Gist in the BST package. The builder function is called buildBalancedAVL, along with extended test cases in BSTJTests

Lastly, I also link this super useful tool I found that helps with understanding the logic and checking the correctness of the results form the code: online AVL tree visualization


[Java] Merge sorted files

Here is a problem that takes more time to code than to figure out: given N files where each line contains a timestamp (as long) and some data sorted in ascending order, merge them in a single file also sorted in ascending order.
The files can be arbitrarily big and are passed in input as a full path list; each has content in the format:


Since the files can be arbitrarily big, we might want to avoid loading them all into memory at once; we can instead keep an open pointer to each of them using O(N) additional space.

This is not enough though, since we also need to write entries to the output file in ascending order too, which would be a big issue if the input files were themselves not ordered already. As long as they are already ordered, we can avoid using a mergesort modification for example and rely on a min heap (aka PriorityQueue) instead.

We will read one line from each file and place it in the min heap which will be in size of O(N) as well and will have a O(log N) handling complexity. To understand later on which entry corresponds to which file and pointer, we use a custom structure FileEntry as object for the heap, meaning we override equals, hashCode and compareTo and we also use a HashMap from filename to BufferedReader so that we can quickly read the next line from each input file once we determine from which one we need to read next.

We keep looping over the heap until it's empty, meaning we read all the lines and also wrote them out. For each element we remove from the heap, we add, if any, a new element (the next line to read) from the same source file; if the file is finished, we remove the reader from the map.

Total runtime is O(N log N) and total space is O(N).

The hardest part here is actually the handling of file IOExceptions, as we might have many BufferedReaders active at the same time and need to gracefully try to close all of them if we encounter an error during processing. The algorithm itself is quite short, but there is a lot of file handling code around it unfortunately; also some code was added to have a more graceful error handling, but it is not strictly necessary.

You can check the implementation of mergeSortedFiles on my Gist along with some test cases (STUB!) in MergeFilesJTests. This time I was too lazy to properly create JUnit tests for these File operations, so please check out the source and edit it as needed, also do NOT run ALL tests together: you will get a "file already exists" exception since the out file is created but never automatically deleted.


[Java] Implement locking mechanism using a binary tree

This is an interesting exercise: implement a locking mechanism backed by a tree structure with parent pointers where a node can be locked if and only if none of his ancestors and descendants are locked.
Have the lock testing operation run in O(1) time, while the lock and unlock operations should run in O(H) time where H is the height of the tree.

The fact that we are given the desired runtimes is the hint we need to understand how to implement this and the parent pointer is also a key element.
A lock testing operation in constant time means we are definitely tracking the information we need on the node itself.

An O(H) lock and unlock operation means we are not scanning both branches of a node AND the ancestors every time we need to check whether any of those contains an already locked node.
The way we can accomplish this then, is if we are tracking an additional piece of information in each node and increase the complexity of both operations to maintain this information updated.

If in each node we track for example the number of locked descendants in both subtrees, we would only need to test if any ancestor is locked during the lock operation before performing the lock and we would need to reduce the count of locked nodes in our ancestors during the unlock operation.

You can check my implementation on my Gist along with some test cases in BSTLockJTests.

I was a bit lazy so I reused an existing BST implementation that is not even fully balanced, but I am expanding this code and do not like rewriting too much :)


[Java] Find duplicate element in array with binary search

Disclaimer: this solution comes from interviewcake, where they mention this as being easier to (I guess) derive than the ones we saw before using Floyd's and Brent's cycle detection algorithms, but to be honest I feel the other way around. Plus bonus points because the graph solution is marked as BEAST MODE which definitely makes us feel proud when the implementation is done :)

Anyway, we still want to keep space constant and not destroy the input, but we have no idea that graphs exist so what options are we left with?

When searching for something, a good idea is to perform a binary search. In our case though, the input structure is not really built in a way where we can safely apply the algorithm since the array is not ordered. But this problem is very specific: given a series of N+1 items in the range 1 to N, find one of the duplicates.

[Java] Find duplicate element in array in constant space and linear time v2

Well a good night's sleep always helps apparently. Yesterday we saw how to find a duplicate element in a constrained array in linear time and constant space and based the implementation for the cycle detection on Floyd's algorithm.
Today we do the same, but use Brent's algorithm instead which supposedly should be overall faster.  

You can find the implementation of findRepeatBrent on my Gist and I also provide some timings comparison in the compareFloydAndBrent method in my test cases FindRepeatJTests. Of course you need multiple runs and maybe a bigger input as well to appreciate the difference, but it appears that overall there is a small performance improvement with Brent's method.

You can also find here an implementation that is slower O(N log N) and still uses constant space, based on binary search.


[Java] Find duplicate element in array in constant space and linear time

Sooo.. spoiler alert: this one was way harder than it should be thanks to 0-index arrays.

Given an array of integers in the range 1..N and length N+1 find one of the possible duplicate elements in O(N) time and O(1) space WITHOUT altering the input.

Those constraints are quite strong since finding a duplicate in O(N) time and space is pretty easy: place all elements in a HashSet while iterating over the input and return as soon as a duplicate is found.

Finding a duplicate in O(N log N) time and O(log N) space is also easy: sort the input - using a kickass quicksort implementation otherwise good luck keeping those costs down - and then walk over it comparing elements in pairs until the duplicate is found.

Finding it without destroying the input and without using additional space is also easy in O(N^2) time: for each element, loop over the full array until a duplicate is found.