Here is a problem that takes more time to code than to figure out: given N files where each line contains a timestamp (as long) and some data sorted in ascending order, merge them in a single file also sorted in ascending order.
The files can be arbitrarily big and are passed in input as a full path list; each has content in the format:
timestamp#data
Since the files can be arbitrarily big, we might want to avoid loading them all into memory at once; we can instead keep an open pointer to each of them using O(N) additional space.
This is not enough though, since we also need to write entries to the output file in ascending order too, which would be a big issue if the input files were themselves not ordered already. As long as they are already ordered, we can avoid using a mergesort modification for example and rely on a min heap (aka PriorityQueue) instead.
We will read one line from each file and place it in the min heap which will be in size of O(N) as well and will have a O(log N) handling complexity. To understand later on which entry corresponds to which file and pointer, we use a custom structure FileEntry as object for the heap, meaning we override equals, hashCode and compareTo and we also use a HashMap from filename to BufferedReader so that we can quickly read the next line from each input file once we determine from which one we need to read next.
We keep looping over the heap until it's empty, meaning we read all the lines and also wrote them out. For each element we remove from the heap, we add, if any, a new element (the next line to read) from the same source file; if the file is finished, we remove the reader from the map.
Total runtime is O(N log N) and total space is O(N).
The hardest part here is actually the handling of file IOExceptions, as we might have many BufferedReaders active at the same time and need to gracefully try to close all of them if we encounter an error during processing. The algorithm itself is quite short, but there is a lot of file handling code around it unfortunately; also some code was added to have a more graceful error handling, but it is not strictly necessary.
You can check the implementation of mergeSortedFiles on my Gist along with some test cases (STUB!) in MergeFilesJTests. This time I was too lazy to properly create JUnit tests for these File operations, so please check out the source and edit it as needed, also do NOT run ALL tests together: you will get a "file already exists" exception since the out file is created but never automatically deleted.
Pages
▼
21/11/2017
14/11/2017
[Java] Implement locking mechanism using a binary tree
This is an interesting exercise: implement a locking mechanism backed by a tree structure with parent pointers where a node can be locked if and only if none of his ancestors and descendants are locked.
Have the lock testing operation run in O(1) time, while the lock and unlock operations should run in O(H) time where H is the height of the tree.
The fact that we are given the desired runtimes is the hint we need to understand how to implement this and the parent pointer is also a key element.
A lock testing operation in constant time means we are definitely tracking the information we need on the node itself.
An O(H) lock and unlock operation means we are not scanning both branches of a node AND the ancestors every time we need to check whether any of those contains an already locked node.
The way we can accomplish this then, is if we are tracking an additional piece of information in each node and increase the complexity of both operations to maintain this information updated.
If in each node we track for example the number of locked descendants in both subtrees, we would only need to test if any ancestor is locked during the lock operation before performing the lock and we would need to reduce the count of locked nodes in our ancestors during the unlock operation.
You can check my implementation on my Gist along with some test cases in BSTLockJTests.
I was a bit lazy so I reused an existing BST implementation that is not even fully balanced, but I am expanding this code and do not like rewriting too much :)
Have the lock testing operation run in O(1) time, while the lock and unlock operations should run in O(H) time where H is the height of the tree.
The fact that we are given the desired runtimes is the hint we need to understand how to implement this and the parent pointer is also a key element.
A lock testing operation in constant time means we are definitely tracking the information we need on the node itself.
An O(H) lock and unlock operation means we are not scanning both branches of a node AND the ancestors every time we need to check whether any of those contains an already locked node.
The way we can accomplish this then, is if we are tracking an additional piece of information in each node and increase the complexity of both operations to maintain this information updated.
If in each node we track for example the number of locked descendants in both subtrees, we would only need to test if any ancestor is locked during the lock operation before performing the lock and we would need to reduce the count of locked nodes in our ancestors during the unlock operation.
You can check my implementation on my Gist along with some test cases in BSTLockJTests.
I was a bit lazy so I reused an existing BST implementation that is not even fully balanced, but I am expanding this code and do not like rewriting too much :)
12/11/2017
[Java] Find duplicate element in array with binary search
Disclaimer: this solution comes from interviewcake, where they mention this as being easier to (I guess) derive than the ones we saw before using Floyd's and Brent's cycle detection algorithms, but to be honest I feel the other way around. Plus bonus points because the graph solution is marked as BEAST MODE which definitely makes us feel proud when the implementation is done :)
Anyway, we still want to keep space constant and not destroy the input, but we have no idea that graphs exist so what options are we left with?
When searching for something, a good idea is to perform a binary search. In our case though, the input structure is not really built in a way where we can safely apply the algorithm since the array is not ordered. But this problem is very specific: given a series of N+1 items in the range 1 to N, find one of the duplicates.
Anyway, we still want to keep space constant and not destroy the input, but we have no idea that graphs exist so what options are we left with?
When searching for something, a good idea is to perform a binary search. In our case though, the input structure is not really built in a way where we can safely apply the algorithm since the array is not ordered. But this problem is very specific: given a series of N+1 items in the range 1 to N, find one of the duplicates.
[Java] Find duplicate element in array in constant space and linear time v2
Well a good night's sleep always helps apparently. Yesterday we saw how to find a duplicate element in a constrained array in linear time and constant space and based the implementation for the cycle detection on Floyd's algorithm.
Today we do the same, but use Brent's algorithm instead which supposedly should be overall faster.
You can find the implementation of findRepeatBrent on my Gist and I also provide some timings comparison in the compareFloydAndBrent method in my test cases FindRepeatJTests. Of course you need multiple runs and maybe a bigger input as well to appreciate the difference, but it appears that overall there is a small performance improvement with Brent's method.
You can also find here an implementation that is slower O(N log N) and still uses constant space, based on binary search.
Today we do the same, but use Brent's algorithm instead which supposedly should be overall faster.
You can find the implementation of findRepeatBrent on my Gist and I also provide some timings comparison in the compareFloydAndBrent method in my test cases FindRepeatJTests. Of course you need multiple runs and maybe a bigger input as well to appreciate the difference, but it appears that overall there is a small performance improvement with Brent's method.
You can also find here an implementation that is slower O(N log N) and still uses constant space, based on binary search.
11/11/2017
[Java] Find duplicate element in array in constant space and linear time
Sooo.. spoiler alert: this one was way harder than it should be thanks to 0-index arrays.
Given an array of integers in the range 1..N and length N+1 find one of the possible duplicate elements in O(N) time and O(1) space WITHOUT altering the input.
Those constraints are quite strong since finding a duplicate in O(N) time and space is pretty easy: place all elements in a HashSet while iterating over the input and return as soon as a duplicate is found.
Finding a duplicate in O(N log N) time and O(log N) space is also easy: sort the input - using a kickass quicksort implementation otherwise good luck keeping those costs down - and then walk over it comparing elements in pairs until the duplicate is found.
Finding it without destroying the input and without using additional space is also easy in O(N^2) time: for each element, loop over the full array until a duplicate is found.
Given an array of integers in the range 1..N and length N+1 find one of the possible duplicate elements in O(N) time and O(1) space WITHOUT altering the input.
Those constraints are quite strong since finding a duplicate in O(N) time and space is pretty easy: place all elements in a HashSet while iterating over the input and return as soon as a duplicate is found.
Finding a duplicate in O(N log N) time and O(log N) space is also easy: sort the input - using a kickass quicksort implementation otherwise good luck keeping those costs down - and then walk over it comparing elements in pairs until the duplicate is found.
Finding it without destroying the input and without using additional space is also easy in O(N^2) time: for each element, loop over the full array until a duplicate is found.
09/11/2017
[Sheet Music] Ludovico Einaudi - Divenire
A contemporary masterpiece, you can find here the full album.
07/11/2017
[Java] Convert expression to reverse polish notation
The closing chapter of my RPN journey, how to convert an infix notation expression to a postfix notation one, so from 3 + 4 we get 3 4 +
Here as well it appears immediately that a String is not the most comfortable input structure, so we stick to an array.
First idea was to scan the input and tokenize it in a left side portion and right side portion for each operator we found:
3 - 4 x 5
would become first:
3, 4 x 5, -
Then finally:
3, 4, 5, x, -
Here as well it appears immediately that a String is not the most comfortable input structure, so we stick to an array.
First idea was to scan the input and tokenize it in a left side portion and right side portion for each operator we found:
3 - 4 x 5
would become first:
3, 4 x 5, -
Then finally:
3, 4, 5, x, -
05/11/2017
[Java] Evaluate reverse polish notation in constant space
Yes it is possible, yes it took me one full day research included, but the RPN can be evaluated in O(N) time and O(1) space, sacrificing the input in the process.
There are some key points to this but let's start with the general idea. To get the algorithm down from O(N) space (the stack) to O(1) we cannot use additional costly structures; luckily our input is all we need, if we are allowed to destroy it while processing. This is often the case with this type of optimization, an in place algorithm.
Consider the following expression corresponding to 3 - (4 x 5): 3 4 5 x -
For RPN reading from left to right, the operator always follows the two operands it should work on and once we evaluated an expression, we no longer need the original data but only the result, so 3 4 5 x - can be rewritten as 3 20 -
We can therefore recycle the processed space (always 3 elements, two operands and an operator) in the input structure to somehow track which elements should we process next. This means we need to point to the next unprocessed element in the sequence, so that the next operator can retrieve its two operands as if they were adjacent.
There are some key points to this but let's start with the general idea. To get the algorithm down from O(N) space (the stack) to O(1) we cannot use additional costly structures; luckily our input is all we need, if we are allowed to destroy it while processing. This is often the case with this type of optimization, an in place algorithm.
Consider the following expression corresponding to 3 - (4 x 5): 3 4 5 x -
For RPN reading from left to right, the operator always follows the two operands it should work on and once we evaluated an expression, we no longer need the original data but only the result, so 3 4 5 x - can be rewritten as 3 20 -
We can therefore recycle the processed space (always 3 elements, two operands and an operator) in the input structure to somehow track which elements should we process next. This means we need to point to the next unprocessed element in the sequence, so that the next operator can retrieve its two operands as if they were adjacent.
04/11/2017
[Java] Evaluate reverse polish notation using a stack
Reverse polish notation, aka the WTF! way of writing arithmetical expressions, presents an interesting exercise on stacks.
The goal is to write a program to evaluate RPN expressions eg: 3 4 + would return 7.
The input can be passed in many ways, good candidates are arrays and lists, a stack directly or a string, although in this case, you would need a separator character to understand when does a number start and end.
The idea here is to process the input sequence from left to right without creating the stack beforehand. If we encounter a number (operand), we push it onto the stack and when we encounter an operator, we pop two items from the stack, then apply the operator in the (gotcha!) REVERSE order because the stack is LIFO and this matters for the asymmetrical operations such as - and /
An interesting note, dividing by 0 a double will NOT throw an exception, we must test if we got infinite as result!
For the implementation I use a String array as input and a Double stack to allow decimal results. Only the four basic operations (sum, subtraction, multiplication, division) are supported, but the code can easily be expanded to support more and even unary operations (pop one item instead of two).
As last note: why can't we do this in constant space while parsing the string itself? Because of situations such as: 3 - (4 x 5) which translates to 3 4 5 x -
Therefore we cannot expect an operator to always follow two operands.. So: String as input = bad.
The current implementation therefore uses O(N) space and time. But you can check here for an implementation that uses O(N) time and O(1) space!
You can check the implementation on my Gist along with some test cases in RPNJTests.
You can find here the code to convert an expression to reverse polish notation.
The goal is to write a program to evaluate RPN expressions eg: 3 4 + would return 7.
The input can be passed in many ways, good candidates are arrays and lists, a stack directly or a string, although in this case, you would need a separator character to understand when does a number start and end.
The idea here is to process the input sequence from left to right without creating the stack beforehand. If we encounter a number (operand), we push it onto the stack and when we encounter an operator, we pop two items from the stack, then apply the operator in the (gotcha!) REVERSE order because the stack is LIFO and this matters for the asymmetrical operations such as - and /
An interesting note, dividing by 0 a double will NOT throw an exception, we must test if we got infinite as result!
For the implementation I use a String array as input and a Double stack to allow decimal results. Only the four basic operations (sum, subtraction, multiplication, division) are supported, but the code can easily be expanded to support more and even unary operations (pop one item instead of two).
As last note: why can't we do this in constant space while parsing the string itself? Because of situations such as: 3 - (4 x 5) which translates to 3 4 5 x -
Therefore we cannot expect an operator to always follow two operands.. So: String as input = bad.
The current implementation therefore uses O(N) space and time. But you can check here for an implementation that uses O(N) time and O(1) space!
You can check the implementation on my Gist along with some test cases in RPNJTests.
You can find here the code to convert an expression to reverse polish notation.
02/11/2017
[Java] Generate string permutations recursively
Ahhhh, string permutations. The timeless classic. Also painfully slow with its O(N!) time and space complexity. Let's put a twist on it and do it recursively just because we can.
Also, just because we can, we use StringBuilders instead of Strings. Is this a good idea? Is the cost of creating lots of StringBuilders offsetting the cost of creating a new String object everytime? That's what method timing is for, we'll leave this as "exercise" for the reader.
Anyway, enough introduction, let's go to the meat: we call a recursive method which will have as input the size of the result string to create, the string we built so far, the string of unused characters so far and the output collection.
Then, for each unused character, we add it to the current string and recurse on the remaining ones leaving that out. The base case is when we no longer have characters to use; if the string we built is the correct length, we add it to the result set and return, otherwise simply return.
Note: StringBuilder.deleteCharAt alters the source directly instead of returning a new object, therefore calling this method as StringBuilder(x.deleteCharAt) and StringBuilder(x).deleteCharAt makes quite the difference!
You can check the implementation on my Gist along with some test cases in PermutateJTests.
Also, just because we can, we use StringBuilders instead of Strings. Is this a good idea? Is the cost of creating lots of StringBuilders offsetting the cost of creating a new String object everytime? That's what method timing is for, we'll leave this as "exercise" for the reader.
Anyway, enough introduction, let's go to the meat: we call a recursive method which will have as input the size of the result string to create, the string we built so far, the string of unused characters so far and the output collection.
Then, for each unused character, we add it to the current string and recurse on the remaining ones leaving that out. The base case is when we no longer have characters to use; if the string we built is the correct length, we add it to the result set and return, otherwise simply return.
Note: StringBuilder.deleteCharAt alters the source directly instead of returning a new object, therefore calling this method as StringBuilder(x.deleteCharAt) and StringBuilder(x).deleteCharAt makes quite the difference!
You can check the implementation on my Gist along with some test cases in PermutateJTests.
01/11/2017
[Java] Zip singly linked list
List zipping means interleaving the list nodes from the top and bottom of a given list to create a new list.
Eg with this input list:
A -> B -> C -> D
the zipping is:
A -> D -> B -> C
Doing this in constant space and quadratic time is trivial, but there is a better solution that avoids scanning the list every time with two pointers running one ahead of the other to find the current last element in the list.
Again as with most linked list problems, the correct handling of the pointers is the trickiest part. The idea is to spend some time to preprocess the list in O(N) time to split it in two halves, and reverse the second half; this way, we now simply need to walk both lists and interleave the nodes in order to produce our result.
Eg with this input list:
A -> B -> C -> D
the zipping is:
A -> D -> B -> C
Doing this in constant space and quadratic time is trivial, but there is a better solution that avoids scanning the list every time with two pointers running one ahead of the other to find the current last element in the list.
Again as with most linked list problems, the correct handling of the pointers is the trickiest part. The idea is to spend some time to preprocess the list in O(N) time to split it in two halves, and reverse the second half; this way, we now simply need to walk both lists and interleave the nodes in order to produce our result.
[Java] Reverse singly linked list
Here is a simple algorithm to reverse a singly linked list in O(N) time and O(1) space. The only tricky part is the correct handling of the pointers :)
The idea is to start with a previous pointer to null and a pointer to the head of the list, then:
1) track the next element in a temporary variable
2) reverse the current element pointer
3) consider the current element as the previous for next iteration
4) move the current element to the next one to handle
5) when no more elements are left to be processed, the previous pointer will be the new list head
You can check more singly and doubly linked lists utilities on my Gist along with some test cases in SingleLinkedListJTests.
The idea is to start with a previous pointer to null and a pointer to the head of the list, then:
1) track the next element in a temporary variable
2) reverse the current element pointer
3) consider the current element as the previous for next iteration
4) move the current element to the next one to handle
5) when no more elements are left to be processed, the previous pointer will be the new list head
//reverse singly linked list
public static SingleLinkedListNode reverseList(SingleLinkedListNode head){
if(head == null) return head;
SingleLinkedListNode prev = null, next, curr = head;
/*
1) track next element
2) reverse current element pointer
3) track current element as previous for next iteration
4) move current element to the next one to handle
*/
while (curr != null) {
next = curr.next;
curr.next = prev;
prev = curr;
curr = next;
}
//at this time this will point to the beginning of the reversed list!
return prev;
}
You can check more singly and doubly linked lists utilities on my Gist along with some test cases in SingleLinkedListJTests.