19/06/2021

[Java] Longest substring with K unique characters

Given a string s and an integer K, find the longest substring with K unique characters in it.

 

We use a sliding window technique to walk along the string while tracking for each character where did we see it last.

As soon as we have found K unique characters, we check if the current substring length is better than our temporary maximum and update the partial result accordingly. We repeat this check until there are K unique characters available in the current substring.

 

Until we find K unique characters, we just keep walking and track each character we see, as we can't yet produce a result.

When the current character was NOT seen in the current window AND by considering it we would exceed the K limit, we are in one of the following scenarios (sample limit is 2):
 

(1) oldest character to evict is on the start of the window

        abc
        i j
 

our window starts at i and we now reach character j. We now have 3 unique characters and our limit is 2, we need to evict the oldest which is A, and reduce our window before continuing. We simply move our window to the next character after A:

        abc
         ij

 

(2) oldest character to evict appears again within our window

        abcba
         i  j

 

our window starts at i and we now reach character j. We have again 3 unique characters within our window, we need to evict the oldest which is B and reduce our window. 

However we cannot simply move to the next, as we would still have 3 unique characters afterwards, we therefore need to jump to the last occurrence of that character (NOT after it as we would be cutting out a valid substring otherwise).


Every other character that we have found in between the two occurrences has to be removed as well, since the jump would cut them out anyway.

        abcba
           ij


To differentiate the two cases, we apply some logic and in case (2) we do not jump directly, rather, we move the window start pointer forward in a loop, deleting from our map all characters we encounter in between.

Additionally, we need to remember to KEEP the last known occurrence of the character at our window start range (B) which we definitely have deleted while shrinking the window in case (2).

The total cost is O(N) time and space as the inner loop is NOT repeated for each character in the string, rather would make us walk the string twice at most.

Additionally, given that our character set will always be limited, we could say space is O(1) as no matter what string we get, we won't ever store more than the number of available characters in our alphabet in our map.

You can check my implementation of longestSubstringWithKUniqueCharacters on my Gist along with some tests in LongestSubstringWithKUniqueCharactersJTests.

No comments:

Post a Comment

With great power comes great responsibility