12/11/2017

[Java] Find duplicate element in array with binary search

Disclaimer: this solution comes from interviewcake, where they mention this as being easier to (I guess) derive than the ones we saw before using Floyd's and Brent's cycle detection algorithms, but to be honest I feel the other way around. Plus bonus points because the graph solution is marked as BEAST MODE which definitely makes us feel proud when the implementation is done :)

Anyway, we still want to keep space constant and not destroy the input, but we have no idea that graphs exist so what options are we left with?

When searching for something, a good idea is to perform a binary search. In our case though, the input structure is not really built in a way where we can safely apply the algorithm since the array is not ordered. But this problem is very specific: given a series of N+1 items in the range 1 to N, find one of the duplicates.


Well then, what if we do not care much about the array we are given as input and focus on the actual search range we have: 1..N. Our duplicate item MUST be in there, since that is the problem definition. Now we suddenly have an ordered array the is perfectly suitable for binary search, but what are we searching for?

If we split the search range in two halves: 1..N/2 and (N/2 + 1)..N we know for sure at least one of the two halves contains at least one duplicate element. Therefore we can evaluate the full array against the search range and compare the number of items found in the range against the expected number of items in that range; if they do not match, our duplicate is definitely there.

We can keep iterating (NO recursion, otherwise space cost is no longer constant) until the search space is only one element, our duplicate.

Gotchas:

mid = floor + ((ceil - floor) / 2);

tattoo that stuff somewhere, it is the binary magic sauce. You miss that, it will never work.

floor = 1;

This is your starting range floor. NOT 0, we are NOT walking through the array, but through the SEARCH SPACE 1..N.

You can find the implementation of findRepeatBinarySearch on my Gist along with some test cases in findRepeatJTests and a timing comparison with the findRepeatBrent implementation; in this case the difference is absolutely evident.

No comments:

Post a Comment

With great power comes great responsibility.

Da grandi poteri derivano grandi responsabilità.