11/11/2017

[Java] Find duplicate element in array in constant space and linear time

Sooo.. spoiler alert: this one was way harder than it should be thanks to 0-index arrays.

Given an array of integers in the range 1..N and length N+1 find one of the possible duplicate elements in O(N) time and O(1) space WITHOUT altering the input.

Those constraints are quite strong since finding a duplicate in O(N) time and space is pretty easy: place all elements in a HashSet while iterating over the input and return as soon as a duplicate is found.

Finding a duplicate in O(N log N) time and O(log N) space is also easy: sort the input - using a kickass quicksort implementation otherwise good luck keeping those costs down - and then walk over it comparing elements in pairs until the duplicate is found.

Finding it without destroying the input and without using additional space is also easy in O(N^2) time: for each element, loop over the full array until a duplicate is found.


Now, for the real men challenge instead: O(N) time, O(1) space, NO input altering. This is possible only thanks to the nature of the problem itself: we are given N+1 elements in range 1 to N, so there is at least one duplicate.

Well then, what if we stop caring about the array and the elements themselves and switch point of view on the problem: consider each element in the array as a node in a graph whose value is the pointer to a single other node.

Since we know there is at least one duplicate, the graph must contain at least one loop. Therefore we now simply need to identify the loop start position.

As starting position we must choose the end of the array (N+1), since we know no element can ever point to it (range 1 to N), therefore we will move at least once before entering in a loop avoiding unlucky self loops in the start.

Now why is this more complex than it appears? The magic behind the 0-index arrays. We need to shift all our reasoning to reflect that fact while walking over the graph; this by itself would not be a big issue, but it makes the classical off by one errors easy to encounter.

You can check my implementation of findRepeat on my Gist along with some test cases in FindRepeatJTests.

You can find here the implementation using Brent's algorithm instead as well as a timing comparison.

You can also find here an implementation that is slower O(N log N) and still uses constant space, based on binary search

No comments:

Post a Comment

With great power comes great responsibility.

Da grandi poteri derivano grandi responsabilità.