10/06/2021

[Java] Manacher's algorithm for longest palindromic substring

Last time we saw a quite simple O(N^2) solution to find the longest palindromic substring using constant space.

It turns out there is an efficient O(N) solution that uses O(N) additional space, proposed by Glenn Manacher.

I suggest watching this video to better understand how does the algorithm work and see a compact solution that works well for both even and odd length palindromes.

Let's take a look at the algorithm using some examples.

Given this string:

abacaba

we run the O(n^2) approach and expand outwards from each position, BUT we track for each position the maximum length of a palindrome we discovered with center in that position. 

We consider positions between characters as centers of even length palindromes (# marks beginning and end of string)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
#   a   b   a   c   a     b   a    #


If we expand outwards from each position, when we reach character C in the main loop, we will run the inner loop and expand successfully all the way to the right until the end of the string. Now we know there is a palindrome of length 7 with center on position 8 which also means its boundaries are 0 and 15.

Our calculation so far tells us:


index  = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
char   = #   a   b   a   c   a     b   a    #
length = 0 0 1 0 3 0 1 0 7 ? ?  ?  ?  ?  ?  ?


Now let's look at character B in position 12, we know that he is part of a longer palindrome since its position is still withing the boundaries of the current one.

This means that on the OPPOSITE, mirrored side of character C we MUST find B again, and it also means that EVERY other character before and after us between the center C and the right boundary MUST appear on the mirror side as well (otherwise C wouldn't be the center of a palindrome).

But if EVERY other character appears in that EXACT order, then WE are the center of another palindrome long at LEAST the SAME length of the palindrome found on the mirror side, UP TO the current right boundary (as the space right after that boundary is not explored yet from our position).

Our calculation can then be updated to:


index  = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
char   = #   a   b   a   c   a     b   a    #
 

length = 0 0 1 0 3 0 1 0 7 ? ?  ?  3  ?  ?  ?
 

Now assume our string was actually longer than what we have, then there might be characters after position 15 that could contribute in making a longer palindrome with center in B in position 12.

When evaluating B in position 12 it's therefore pointless to explore A in position 13 as we already know from the previous observations that B is the center of a palindrome of length 3. We can then skip expanding from B itself and just attempt the expansion from position 15 onwards, which is the known right boundary of the palindrome with center in B position 12.

In the case of character A in position 4 instead, the mirror side tells us he is the center of a palindrome of length 1 which would fall within the current boundaries of our palindrome with center in C:

index  = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
char   = #   a   b   a   c   a     b   a    #
length = 0 0 1 0 3 0 1 0 7 0 1  0  3  ?  ?  ?
 

As A in position 10 does NOT reach the boundary of the palindrome containing it, we CANNOT attempt expansion from that boundary as it wouldn't make sense. However attempting an expansion from A in position 10 itself will immediately fail therefore we won't execute our inner loop at all.

We notice 2 things that happen when we expand:

  1. we FAIL to expand immediately
  2. we expand from a position farther than our current neighbor if we are part of a bigger palindrome

Therefore while expanding we do NOT go over the same characters from the right side multiple times,
particularly, as soon as we managed an expansion until the end of the string, we simply do NOT expand at all anymore as we can't move forward from the end of the string itself.

Once we are done with the main loop we MUST scan our length data in order to find the longest palindrome(s) we have. Finally when we have identified them, we can convert their start and end indices into valid indices in the source string to extract the desired substring.

You can find my implementation of longestPalindromicSubstringManacher on my Gist along with some tests in LongestPalindromicSubstringManacherJTests.

No comments:

Post a Comment

With great power comes great responsibility