30/06/2021

[Java] Ways to traverse a grid

Given two integers N and M representing the number of rows and columns in a grid, calculate in how many ways we can traverse from top left to bottom right moving only right or down.

We can notice that the starting cell cannot be reached by moving, so we set its value to 0. All cells on the first row and column can only be reached in one way that is moving always right or down. We set them to 1.

All other cells can be reached by moving right or down from a previous left or top cell, which means they can be reached by continuing either of those paths, therefore they can be reached by summing up the ways of reaching the previous top and left cells.

This approach takes O(MN) time and space.

However it turns out there is a better approach that works in constant space and O(M+N) time.

Since each inner cell can be reached by summing some combinations of reaching previous cells, it means we are enumerating permutations of right and down movements up to a certain cell.

Therefore the magic fomula is (N+M)! / (N! * M!)

However, we need to use N-1 and M-1 as we are calculating the inner portion of a matrix where we actually move either direction, example:

0111

1###

1###

1###

We only do the calculation for the # cells as the boundary ones only offer one way of reaching them.

Because we only need to calculate three factorials, and the biggest is N+M, our runtime is O(N+M) - beware of overflows though!

You can check my implementations of waysToTraverseGrid and waysToTraverseGridCombinations on my Gist along with some tests in WaysToTraverseGridJTests.

[Java] Remove consecutive nodes with zero sum from linked list

Given a singly linked list, remove all consecutive nodes that sum up to 0 from it.

Examples:

0 return null

6, -6 return null

10, 4, -4 return 10

10, 4, -3, -1, 2 return 10, 2

We use two pointers, one is sitting before the head, the other walks the list.
We keep summing values of nodes up and each time we store in a map the pair(sum up to this node, node) then we check the current sum, we can have the following cases:
(1) sum is 0: this means from start to end all elements sum to 0, we need to remove all of them and we need to carefully handle the head:
- if start pointer is null, it means this 0 sum includes the head, we will update it
- if start pointer is not null, it means this 0 sum occurs from a node (which might be the head), we update the NEXT of it
(2) sum is a value we already encountered: this means that AFTER the last node where we saw that sum until the end all elements sum to 0, we update the NEXT of the current start to cut them out
(3) otherwise we store the value in the map and move on to the next element

Whenever we perform a cut, we rest the counter to 0 and clear the map since we remove all consecutive nodes whose sum is 0, therefore after a cut we restart from the last node we kept.

This works in O(N) time and uses O(N) space.

You can check my implementation of removeNodesWithZeroSum on my Gist along with some tests in RemoveNodesWithZeroSumJTests.

29/06/2021

[Java] Longest palindromic subsequence

Given a string, find the longest palindromic subsequence it contains.

A non palindromic string, still contains a palindromic subsequence, as a single character is a valid palindrome.

If a string contains a palindromic subsequence, there will be some characters that appear in a certain order on both sides of the middle element of the palindrome (either a single character for odd length palindromes or the space in between two characters for even length palindromes).

For example:

acca, acfca are palindromes

acbgca contains more palindromic subsequences, the longest being acbca or acgca

abccga contains more palindromic subsequences, the longest is acca

For any given palindromic subsequence, we must find the same characters in the same order on both sides of the middle point in the original string.

If we reverse the original string in O(N) time, we can then use our logic to find the longest common subsequence to find this palindrome in O(MN) time and space, which in this case is O(N^2).

Example:

abccga reverses to agccba

And the longest common subsequence to both is acca

asd reverses to dsa

and the longest common subsequence to both is any single character

You can check my implementation of longestPalindromicSubsequence on my Gist along with some tests in LongestPalindromicSubsequenceJTests.

28/06/2021

[Java] Longest common subsequence

Give two strings, find the longest common subsequence. A subsequence is a sequence of characters, not necessarily adjacent, that appear in a specific order in both strings.

Example:

sda, dlolsla

longest common subsequence is "sa" of length 2, other subsequences are "d", "s" and "a".

[Java] Minimum cost of painting houses using K colors

We have N houses in a row and K available colors. Given a NxK matrix where each cell represents the cost of painting house N with color K, find the minimum cost of painting all the houses such that no two adjacent houses have same color.

We notice the solution can be reached using DFS recursive approach where we explore the costs of painting each house using all colors except for the one used for the previous house.

Starting from the first house we have all K choices available, however the second house will have K-1 choices since we used a color already for the first, and so on.

This means we have a O(K^N) runtime as for each house we must explore all those color choices. Our space usage will be O(N) as for each house we explore all others and that would be the depth of our stack.

We can improve on this by caching results as we go such that when we encounter the same (house, excluded color) combination again, we already have our best answer and can return that without further calculation.

We can create another NxK matrix or use a Map where the pair(house, excluded color) is the key and the value is the minimum cost of painting the current house EXCLUDING that color + cost of painting the next excluding the BEST color chosen for the current house.

By doing this, we reduce our runtime to O(NK) as for each house we explore all K colors and use O(NK) space as well for our cache.

You can check my implementation of minCostToPaintHouses on my Gist along with some tests in MinCostToPaintHousesJTests.

[Java] Maximise profit to cut rod

Given a rod of length N and an array indicating the profit from selling a rod of length i + 1, find the best way to cut up the rod in order to maximise the profit of selling all the pieces. Not cutting the rod at all is also a valid option.

Example:

[1,5,2,7]

means our profit for selling a rod piece of length 1 is 1, length 2 is 5, etc

If we picked length 4, the best would be cutting it in two pieces of length 2 for a total profit of 10.

We can use a DFS recursive approach where for each rod size from 1 to N and all possible cuts of a rod of that size, we pick the best between profit of NOT cutting rod and profit of cutting on that length + best profit of cutting the remainder.

Since we have two possible choices at each step, our algorithm runs in O(2 ^N) time and uses O(N) space as at most our DFS depth will reach the length of the rod.

27/06/2021

[Java] Find all duplicates between two sorted arrays

Given two sorted arrays, find all elements that appear in both arrays, return a sorted array of those as result.

Since either array could be much bigger than the other, if we use a two pointer approach where we walk both arrays at the same time and determine which pointer to increase we get a O(M+N) runtime, if instead we use binary search on the bigger array for each element in the smaller, we get a O(M log N) runtime.

Two caveats on this solution, we had created a generic binary search method, but it uses Java generics and in Java int[] is not same as Integer[]. For convenience we therefore use Integer[] as input.

Additionally, we need to output an int[] but we do not know beforehand the size of it (will be at most min(M,N)) so we store results as we go in a queue which will allow us to create the sorted output array in the end.

You can check my implementation of findDuplicatesInArraysBinarySearch on my Gist along with some tests in FindDuplicatesInArraysBinarySearchJTests.

26/06/2021

[Java] Number of subsequences that sum up to target with dynamic programming

We have seen a graph approach to the problem of finding the number of subsequences that sum up to a given target, however that approach was inefficient since we kept evaluating the same unique nodes multiple times.

With dynamic programming, we can reduce complexity by caching partial results.

We approach the problem from a similar perspective, but perform a DFS instead of BFS and cache the ways we calculated a partial sum from a given element in the array.

For each element in the array, all previous subsequences have been investigated, therefore we can consider only the contribution we would add to those partial sums.

Space and time complexity becomes O(NM) where N is size of input array and M is target value.

You can check my implementation of howManySubsequencesSumUpToTargetDP on my Gist along with some tests in HowManySubsequencesSumUpToTargetDPJTests.

[Java] Number of subsequences that sum up to target value

Given an unsorted array of positive integers without duplicates and a target value, find how many subsequences of the array sum up to the given target.

Example: [10,6,2,4] target = 100, result is none

[10,6,2,4] target = 16, result is 2: (10,6,) and (10,2,4,)

We can see this problem as a graph search where each node represents the partial sum generated by the elements chosen in the subsequence create until that point. This is very similar to counting how many ways we can reach a target, but in this instance we are only allowed to pick each item once.

For each value in the array, we have a choice to either include it in a subsequence or not, this choice is repeated for each intermediate node in our graph until we deplete the elements in the input array.

In our example that would be (each node has two children in the next level and elements in parentheses indicate nodes that we did NOT enqueue because they were either the target or bigger than the target):

0

0 10

0 6 10 (16)

0 2 6 8 10 12 (16) (18)

[0 4 2 6 6 10 8 12 10 14 12 (16) (16) (20) (18) (22)]

Before enqueuing a node, we verify whether there are more elements in the array after it, otherwise it makes no sense enqueuing it as we won't get there to process it. Therefore the last level in square brackets is NOT added to the queue.

Everytime we encounter a node whose sum is the target, we track it. Since we have no duplicates and process elements in order, we won't ever add the same node (same subsequence) twice. HOWEVER we keep adding the exact same node over and over to the queue as we need it to calculate the longer subsequences starting from there.

To determine when to start considering the next element from our input, we keep enqueueing a marker value (0) and each time we encounter it, we move a pointer to the next element in the input array. When this pointer exceeds the array bounds, we terminate the search.

Additionally, to reflect the "pick current element or not" choice, we always enqueue the current node without summing it to the current element as well.

For each element in our array we add 2^level elements in the queue at most, and the number of levels is at most is the length of our array, therefore we have a runtime of O(2^N) and space is also O(2^N) which is the case where we have added the last level to our queue. 

This is a clear case where dynamic programming helps reducing the complexity by avoiding the repeated processing of the exact same node multiple times.

You can check my implementation of howManySubsequencesSumUpToTarget on my Gist along with some tests in HowManySubsequencesSumUpToTargetJTests.

[Java] Shortest unique substring

Given an array of unique characters and a string, find the shortest substring containing all the characters in the array.

Example: 

['a','b','c'] and "lol" is empty string

['a','b','c'] and "abc" is empty "abc"
['a','b','c'] and "abbbbca" is empty "bca"

We add all characters in the array to a set and use a sliding window technique where we keep increasing our end range until we have found all missing characters. If we encounter a character that is not valid (was not in the input array), we restart from the next (jump window start to end + 1 to skip this invalid character).

For each character we find, we track the last known position in a map.

When we have found all missing characters, we update best minimum so far, then we try to decrease substring by advancing window start and checking if we still have a valid string.

The character to evict (pointed at by window start range) must appear after the new window start, we continue until possible. 

We get a O(N) time and space solution.

You can check my implementation of getShortestUniqueSubstring on my Gist along with some tests in GetShortestUniqueSubstringJTests.

25/06/2021

[Java] Minimum subarray of sum K

Given an array of positive integers and a target positive integer value, find the subarray of consecutive elements whose sum is equal to or more than target.

We use our trusty sliding window technique, and keep summing until we reach the threshold.

Then we check if we can reduce the current subarray size by reducing the window start range while remaining above the threshold and keep doing that as long as it's possible (either we reached current element in array or sum of subarray would fall below threshold).

At most we do 2N passes on the array as the end range pointer will walk all of it and the start range pointer might follow all the way to the end, so in O(N) time ans constant space we have our answer.

You can check my implementation of minSubarrayOfSumK on my Gist along with some tests in MinSubarrayOfSumKJTests.

[Java] Find word in matrix using KMP pattern searching

Here is an implementation that finds whether a word appears in a matrix, using a more efficient approach based on KMP algorithm for string search.

We can do the word preprocessing in O(S) time, then we walk the matrix and after we have read a whole row or column, we run our algorithm using the whole row or column as our text.

Therefore we do NM work to read all the maximum length strings represented by a whole row or column, giving us a total of N strings of length M + M strings of length N. On each of these we do M and N work respectively to test for a match, giving us total time of O(NM + NM + MN + S) which is O(NM + S) using O(S) extra space.

You can check my implementation of wordSearchLRTBusingKMP on my Gist along with some tests in WordSearchLRTBusingKMPJTests.

[Java] Knuth–Morris–Pratt text search algorithm

The Knuth–Morris–Pratt algorithm is an efficient way of searching for a pattern within some text. By precomputing prefix information from the pattern using extra O(M) space, it gives a O(N+M) algorithm that finds all occurrences of a given string within another string.

The idea is that if some portions of the pattern repeat AND we had found a match until a certain point in the pattern, we do not need to go all the way back to the start of the text, but can continue with an offset that is equal to the matched portion.

You can check my implementation of searchSubstringKMP on my Gist along with some tests in KMPJTests.

[Java] Find word in matrix

Given a matrix of characters and a word, find if the word appears in the matrix. Words can be searched in the matrix only going left to right or top to bottom, but NOT mixing them.

Example

c a r

a r d

r f d

Word = car can be found either in first row or first column but NOT in 'ca' from first row + 'r' from next row.

We can run a loop on each character in the matrix and if we find a match for the first character of the string, we start two searches:

  • rest of the row
  • rest of the column

to see if we find a match for the word.

Assuming our word has length S, our complexity is O(MNS) as for each cell in the matrix we do at most S work.

You can check my implementation of wordSearchLRTB on my Gist along with some tests in WordSearchLRTBJTests.

24/06/2021

[Java] Find intersection of singly linked lists

Given two singly linked lists that might have a node in common, find that node without altering the lists and using constant space.

If we could alter the nodes in the list, we could flag visited ones and do a O(M) followed by an O(N) walk, when we find an already visited node, that is the common node.

An alternative could be to link the tail of a list to the head, then use a cycle detection algorithm to find the start of the cycle which will be the common node.

But without touching the lists or nodes we are limited in our options. One thing we can do is walk both lists once to find their lengths, then walk down the longer list until we reach same length as the other, finally walk down both lists at the same time until we find a common node.

All these approaches have O(M+N) time and O(1) space.

You can check my implementation of findIntersection on my Gist along with some tests in FindIntersectionJTests.

[Java] Find all square submatrices of sum K or greater

Here is an interesting problem, given a non necessarily square matrix and a threshold value K, find all square submatrices whose sum is greater than or equal to K.

Example:

111

111

111

If K = 1, we have:

1x1 submatrices = 9 of value 1

2x2 submatrices = 4 of value 4

3x3 submatrices = 1 of value 9

total is 14 submatrices whose sum is more or equal to K. 

One approach is to consider each cell as the top left corner of a square submatrix. From there we can then expand to squares of increasing sizes while counting the total sum, until the square would go out of bounds. We can do this in O((NM)^2) time as for each of the NM cells we do NM work.

We can notice that we do not need to recalculate the whole square when we increase it since the bigger square includes the smaller one, we can therefore only do the work necessary to include the new values, we cover with the new size:

11#

11#

###

To compute from the 2x2 square the bigger 3x3, we only need to add the cells marked with #.

CAUTION: notice the bottom right would be added twice, therefore we sum all # cells and subtract the value of bottom right once.

This is done with two separate loops both of which have O(min(N,M)) time for up to min(N,M) times, giving us O(min(N,M)^2) time for a total of O(NMmin(N,M))^2) time and O(1) space.

If we precompute the prefix matrix in O(NM) time using extra O(NM) space, we can reuse already calculated submatrices instead. In this case, we consider each cell as the bottom right cell of a square submatrix. For each of the NM cells, we try to expand from there using data from the prefix matrix to quickly calculate the value of each square. This calculation costs O(min(N,M)) and in total we repeat it at most min(N,M) times.

The resulting time then becomes O(NM + min(N,M)^2) at the cost of extra O(NM) space.

Maybe there is some clever mathematical observation to be done using additional space/structures that could bring the time down to O(NM) but I couldn't figure it out yet.

You can check my implementation of numSquaresOfSumK and numSquaresOfSumKDP with the helper function prefixSum to calculate the prefix matrix on my Gist along with some tests in NumSquaresOfSumKJTests and NumSquaresOfSumKDPJTests.


23/06/2021

[Java] Wagner-Fischer algorithm for string edit distance (Levenshtein distance)

We've already seen a way to calculate the edit distance between two strings, which was a bit laborious, turns out there is a nice compact algorithm invented by Wagner and Fischer to achieve the same using dynamic programming with O(NM) time and space complexity.

We create a N+1xM+1 matrix representing our strings with the empty string prepended. We then walk the matrix and determine the cost of generating the rest of the string given the cost of reaching the current edit for the substring we are considering at each point.

The idea is this: 

  • from the empty string to reach the other string we need to insert all characters 
  • from a full string to reach the empty string we need to delete all characters

We can initialize this with a two single loops on M and N (reason why the matrix is size of both strings + 1), then for each position in our matrix, we verify the cost of transforming the current prefixes to have a match by inserting/deleting/replacing the current character.

If the current character in the prefixes is same, we have no work to do, otherwise we must do 1 replacement operation, then we pick the minumum cost between:

  • inserting an additional character in this position
  • deleteing a character from this position
  • replacing a character in this position

in the first two cases, the cost is the cost of generating this prefix from the previous step + 1 insertion/deletion and in the last case it is the cost of generating this prefix from the previous step + cost of replacement (if characters were same, it's 0, otherwise 1).

We will have our final answer in position MxN in our matrix.

You can check my implementation of editDistanceWagnerFischer on my Gist along with some tests in EditDistanceWagnerFischerJTests.

[Java] String edit distance (Levenshtein distance)

Given two strings, find their edit distance. Edits are: character insertion, deletion, replacement. The edit distance is the minimum amount of edits necessary to reach the desired goal.

This sounds like a graph problem, where we start from a string and want to reach another string through a series of modifications. Either BFS or DFS can be used.

In this approach we use BFS and a queue containing the pairs(string, number of edits so far).

We see three possible cases:

(1) start string is longer than target string: in this case we need to remove one or more characters from the start. We generate all strings that result from deleting one character from the start and them to the queue.

(2) start string is shorter than target string: in this case we need to add one or more characters to the start. Adding every possible character is not convenient and we also don't need to do that. We can add a wildcard character which we will replace with the appropriate one later on

(3) start and target string have same length: in this case we walk over the strings at the same time and count how many characters are different, those are the replacements we need to do. If we encounter a wildcard character, we replace it for free as we could have chosen this exact letter when we inserted it

By tracking all strings we generate, we avoid adding duplicates to the queue, this could happen when strings have repeating characters or one is at least two characters shorter than the other. When a string is removed from the queue, we can also remove it from the set as it will be impossible to recreate again since we're constantly adding or removing characters, which means we can't generate a same length string after each step.

Consider that creating all substrings from a string is O(N^2) time, this also applies to adding a character to a string. Additionally, the hashing of a String value might be O(N) time as it's an array of chars behind the scenes.

Assuming the longest string is M and shortest is N, our runtime is O((M-N) x M^2 + (2^(M-N)) x N) as we perform M^2 insert/delete character operations for M-N times until we generate all possible strings of same length. The number of new strings we generate is 2^(M-N) as we have a choice to either insert/delete one character each time or not, and for each of those we do a string scan of the whole shorter string in N time.

I guess we can say when M-N << M and N then our runtime is O(M^2 + N)

We handle special cases such as empty and null strings before our logic so we can return those in constant time.

If strings have same length already, algorithm runs in linear time as we don't generate anything and perform only one pass over the string.

Space complexity is O(2^(M-N))  as at most in the queue and set we store the amount of elements we generate at the lowest level of our traversal.

You can check my implementation of editDistance on my Gist along with some tests in EditDistanceJTests.

22/06/2021

[Java] Find maximum in all subarrays of length K

Given an array of integers and a window size K, find the maximum element for all subarrays in each window.

Example:

1234 and K = 2

(1,2) - 2

(2,3) - 3

(3,4) - 4

An interesting solution that comes up is to use a max heap of maximum size K to track the maximum element within a window of size K over the array.

When we slide the window, we delete from the heap the element that was at the start and add the element that is at the end. This costs O(logK) for both delete and insert. Doing this operation for each element in the input array yields O(N logK) time and O(K) space complexity. For example:

 public static void maxOfK(int[] numbers, int k){  
   PriorityQueue<Integer> maxHeap = new PriorityQueue<Integer>((e1,e2) -> -e1.compareTo(e2));  
   int i = 0;  
   for(int j = 0; j < numbers.length; j++){  
     maxHeap.add(numbers[j]);  
     if(maxHeap.size() == k){  
       System.out.println(maxHeap.peek());  
       maxHeap.remove(numbers[i]);  
       i++;  
     }  
   }  
 }  

However a challenge is to do this in O(N) time and O(K) space only.

[Java] Find all duplicates in array of positive elements in range 1 to N

Given an array where each value is in the range 1..N, find all duplicates.

We can easily do this in O(N) time with additional O(N) space tracking each element in a set, then seeing if we find multiple times the same element.

Sorting in O(N log N) would allow us to use O(1) space to find same result.

If we are allowed to modify the input though, we can do this in O(N) time and O(1) space by modifying the input array while walking on it.

Since all elements are in range 1..N, it means each would point to a valid cell in the array in range 0..N-1.

We walk over the array and look at the cell the current element points to, if we find a negative value, some other element pointed to it before us and we are a duplicate, otherwise we flip the value in that cell and continue.

You can find my implementation of findAllDuplicates on my Gist along with some tests in FindAllDuplicatesJTests.

[Java] Longest subsequence of consecutive numbers

Given an unsorted array of numbers, with duplicates allowed, find the length of the longest sequence of consecutive numbers.

An easy approach is to sort the input in O(N logN) time and then walk over it in O(N) time to find our answer. There are two problems here: (1) we need to sort and (2) if sort algorithm is not in place, it uses extra O(N) space.

21/06/2021

[Java] Delete islands

Given a matrix of zeros and ones, where groupings of ones represent land, delete all islands.
An island is a group of 1 or more ones which are NOT horizontally or vertically connected to a border.

Deleting an island means setting it to 0.

Example:
0001
1010
1000


should return:
0001
1000
1000


where only the lone 1 in the middle is deleted.

For each point in the borders that is a one, we start a DFS search for all horizontally or vertically connected ones, flipping all of them to -1.

We flag each point we add to the stack as soon as we add it to avoid putting duplicates in.

At the end of this search, we scan the whole matrix and restore the flipped ones while deleting the non flipped ones.

Complexity is O(NM) for both time and space. Worst case for space is when matrix is full of ones, therefore we fill the stack with almost all points in the matrix during the DFS.

You can check my implementation of deleteIslands on my Gist along with some tests in DeleteIslandsJTests.

[Java] Test if two matrices are equals

Given two matrix objects, check if they are equals. They are not guaranteed to have the same dimensions. They are equals only if they have the same dimensions and the content is the same.

I just needed this to use in testing another exercise, the code is quite self explanatory and you can find it in areMatrixEquals in my Gist along with tests in AreMatrixEqualsJTests.

[Java] Moments when all lights are blue

While looking for a different exercise, I found this one and decided to give it a go: given an array of unique integers in the range 1..N representing the moments when a light in position input[i] is turned on, return the number of moments when ALL lights are on (Y) in an interrupted sequence.
We call these sequences "blue lights", initially all lights are off (I).

20/06/2021

[Java] Find two missing numbers in array

Given an array of distinct integers in range 1..N and of length N-2 find the two missing numbers.

The O(N) time and space solution should be obvious, we can also use O(1) space if we spend O(NlogN) time sorting the input first.

However, there is a nice bit magic we can do to have a O(N) and O(1) space solution which involves the almighty XOR and the magic formula:

XOR of all elements in range 1..N with all elements in array gives Z, which is XOR of missing numbers X,Y.

Since we have no duplicates and missing numbers are in range 1..N, we know X and Y cannot be equal, this means X will have a bit set in a place that Y hasn't otherwise XOR would be 0 since it sets a 1 in every place where the two bits are different.

We can get a set bit in Z with the magic formula:

z & ~(z - 1)

We can then divide the array in two sets of elements, those who have that bit set and those who don't as our two missing numbers will be part of only one of those sets.

We XOR all those elements together and also with all elements in range 1..N that also have the same bit set or not, the result of the two sets of XORs are X and Y.

For example:

input = 4,2,3 we are missing 1,5 in binary: 001, 101

a = xor of all numbers 1..5 is: 1

b = xor of all elements in the array is: 5

z is a XOR b = 4

since x,y cannot be the same number, they have for sure at least one different bit

we pick rightmost set bit in the xor: 4

a = xor of all numbers in range 1..5 with that bit set is: 0

b = xor of all numbers in range 1..5 with that bit NOT set is: 1

c = xor of all elements in array with that bit set is: 1

d = xor of all elements in array with that bit NOT set is: 4

x = a ^ c = 1

y = b ^ d = 5

You can check my implementation of findTwoMissingNumbers on my Gist along with some tests in MissingTwoNumbersInArrayJTests.

[Java] Find pair of values from two arrays that sum up closer to given target

Given two arrays, not necessarily of same length, find a pair of values with one value from each array whose sum is closest to a given target.

Example

1, 2, 3

3, 1, 4

target = 5, we return (2,3) or (1,4)

target = 10 we return (3,4)

The simple O(NM) solution without using extra space, is simply to compute all sums and track the closest to our target.

Luckily we have just seen how to find a value in a sorted matrix, and a key insight to solving this problem is recognizing we can process our input to end up in the scenario from the other problem.

If we sort the arrays in O(NlogN + OMlogM) time, we can then apply the algorithm from before and find a solution in O(N+M) time. The overall runtime would therefore be O(NlogN + OMlogM) time and O(N+M) space assuming for example we are using mergesort on both arrays.

To avoid creating the matrix though (would be O(NM) time), we can just use pointers and consider one array as the rows and the other as the columns, then calculate only the sums we need while walking around of fictional matrix.

By tracking the best pair and delta from target seen so far, we will eventually reach our solution.

You can check my implementation of findClosestPairSumUpToK on my Gist along with some tests in ClosestPairSumUpToKJTests.


[Java] Minimum number of characters to delete from string to create valid word

Given a string s and a set of words, find the minimum number of characters to delete from the string to generate a valid word.

If no word can be created, return -1.

We use a BFS approach to explore all strings generated by deleting one character from a given string, and track which strings we have generated so far to avoid doing duplicate work.

Each time we generate a new string, we query the set to check whether it's a valid word, and if yes, return the result.

We also notice that from a word of length N, it's impossible to generate any string of length N by removing a character, therefore there is no need to keep a word in our seen set AFTER we have processed it.

With this approach our worst case is where the string is made of all unique characters and no dictionary word can be generated. We have a O(2^N) time as for each character in the string, we either add a new string or not to our exploration space and by tracking the seen nodes using extra O(N) space we avoid a factorial runtime. The space is O(N) since we track only the nodes that are currently in the queue, and because we do not add duplicates, we add at most 2N strings to the queue at any given time.

Another thing to observe is that the substring function cost is O(N), so generating all strings with one less character from a given string of length N is total O(N^2) time. For all nodes at each level we do O(N^2) work to generate all strings for the next level.

The final runtime would be (2^N + N^2) which is O(2^N)

You can check my implementation of minDeletionsToFormDictionaryWord on my Gist along with some tests in MinDeletionsToFormDictionaryWordJTests.

[Java] Reverse stack without using additional data structures

Given a stack, reverse it without using additional data structures. Example:

S     becomes    S      

1                4

2                3

3                2

4                1

Without additional structures, the only available space for us is the call stack, therefore we must use recursion.
 

To reverse a stack we need to pop all elements out and then add them back in order. However in a single pass this cannot happen, as we have access to only one element on the top and can store only one additional element in a variable.
 

We can however process the stack multiple times for each element in two phases:

(1) store top element from the given stack

(1.1) continue same logic until stack is empty

(2) push the top element at the bottom of the stack
 

phase (2) uses a similar logic, given an element X:

(2.1) store the current top element in the stack

(2.2) push X at the bottom of the stack

(2.3) add back the stored top at point (2.1)

The overall process is therefore:

  1. store top element from the given stack - X
  2. goto 1 until stack is empty
  3. store the current top element in the stack - Y
  4. push X at the bottom of the reduced stack
  5. add Y back to the stack

The runtime of this code is O(N^2) since for each element in the stack we traverse the whole stack again in phase (2), while space is O(N) since at any point in time we won't have more than N calls on the stack .

You can check my implementation of reverseStack on my Gist along with some tests in ReverseStackJTests.

[Java] Find value in sorted matrix

Given a NxM matrix (where M could be equal to N) which is sorted by rows and colums, find whether a value is in the matrix.

The sort property ensures that for a given row or columns, all values are in ascending order. It is not guaranteed, that the first value of each row or column is greater than the last value of the previous row or column, for example:

10, 20, 30, 40
15, 25, 35, 45
27, 29, 37, 48
32, 33, 39, 50

is a sorted matrix.

19/06/2021

[Java] Longest substring with K unique characters

Given a string s and an integer K, find the longest substring with K unique characters in it.

 

We use a sliding window technique to walk along the string while tracking for each character where did we see it last.

As soon as we have found K unique characters, we check if the current substring length is better than our temporary maximum and update the partial result accordingly. We repeat this check until there are K unique characters available in the current substring.

 

Until we find K unique characters, we just keep walking and track each character we see, as we can't yet produce a result.

When the current character was NOT seen in the current window AND by considering it we would exceed the K limit, we are in one of the following scenarios (sample limit is 2):
 

(1) oldest character to evict is on the start of the window

        abc
        i j
 

our window starts at i and we now reach character j. We now have 3 unique characters and our limit is 2, we need to evict the oldest which is A, and reduce our window before continuing. We simply move our window to the next character after A:

        abc
         ij

 

(2) oldest character to evict appears again within our window

        abcba
         i  j

 

our window starts at i and we now reach character j. We have again 3 unique characters within our window, we need to evict the oldest which is B and reduce our window. 

However we cannot simply move to the next, as we would still have 3 unique characters afterwards, we therefore need to jump to the last occurrence of that character (NOT after it as we would be cutting out a valid substring otherwise).


Every other character that we have found in between the two occurrences has to be removed as well, since the jump would cut them out anyway.

        abcba
           ij


To differentiate the two cases, we apply some logic and in case (2) we do not jump directly, rather, we move the window start pointer forward in a loop, deleting from our map all characters we encounter in between.

Additionally, we need to remember to KEEP the last known occurrence of the character at our window start range (B) which we definitely have deleted while shrinking the window in case (2).

The total cost is O(N) time and space as the inner loop is NOT repeated for each character in the string, rather would make us walk the string twice at most.

Additionally, given that our character set will always be limited, we could say space is O(1) as no matter what string we get, we won't ever store more than the number of available characters in our alphabet in our map.

You can check my implementation of longestSubstringWithKUniqueCharacters on my Gist along with some tests in LongestSubstringWithKUniqueCharactersJTests.

[Java] Find floor and ceil of binary search tree item

Given a BST and a value, find the floor and ceil items in the tree for it.

If item is in the tree, floor and ceil are the item itself, otherwise they are the closest smallest and largest values in the tree. If floor or ceil do NOT exist for an item, return them as MIN or MAX int.

Example:


                      8
                   4    12
                 2  6  10 14

For this tree, item 4 is in the tree, so floor and ceil = 4

Item 1 however is NOT in the tree, it would be a left child of item 2, so floor and ceil are MIN, 2

Item 5 is NOT in the tree but we have 4 and 6 as floor and ceil for it.

We can use a recursive DFS approach to scan the tree testing each node on our path to the correct location of the item we are searching for.

We start with floor = MIN and ceil = MAX int and we update our boundaries as we traverse the tree.

Worst case our tree is a single branch of increasing or decreasing elements, and our element is the largest or smallest NOT in the tree, therefore complexity is O(N) time and space.

You can check my implementation of findFloorAndCeil on my Gist along with some tests in FindFloorAndCeilJTests.

[Java] Invert binary tree

Given a balanced and complete binary tree invert it, all right children become left children and all left children become right children.

Example:

    a
/ \
b c
/ \ /
d e f

becomes:

  a
/ \
c b
\ / \
f e d

We need to visit each node and invert its children. We can accomplish this in two ways, BFS or DFS.

If we use BFS, we add nodes to a queue and process them in the order we first saw them, which means when we reach the last level we will have O(N/2) nodes in the queue, which is O(N) space.

If we use DFS we add nodes to a stack, or we approach the problem recursively (or we use a queue and remove from the end of it), then we will have O(log N) nodes in the stack since we keep going down to each leaf before returning, therefore we won't add more nodes than the path to the leaf, which in a balanced and complete tree is O(log N) at most, the height of the tree.

Note that the balanced and complete tree is the worst case as we have the most nodes to handle and each level is full.

You can check my implementation of invertBST on my Gist along with some tests in InvertBSTJTests.

18/06/2021

[Java] Longest common substring

Given two strings, find the longest common substring, if any.

We can simply scan both strings at the same time starting at each position and find our answer in O(NM^2) time and constant space.

However, we notice that we potentially redo work that was already done at previous iterations, we can therefore use extra O(NM) space to cache this work and reduce our runtime to O(NM) only.

We cache the max length of matching substrings for each pair of indices i,j in the two strings.

If we are at the beginning of a new substring, the first match will be of max length 1, otherwise subsequent matches will be previous max length + 1. For example

        01234
        abcba
        dbcbe


Starting at a, there is no match for it in the other string

Starting at b, we find a first match in position 1 in the other string, then a second match in position 3

Starting at c, we find a first match in position 2 in the other string, however at the previous iteration we had already found a match (i-1, j-1), therefore we expand on that match

Starting at b, we find a first match in position 1 in the other string, with no prior match for that position, the max length is 1. The second match in position 3 instead is expanding on the previous match for the string "bc", therefore we expand on that length. Our matrix looks like:

  0 1 2 3 4

0 0 0 0 0 0

1 0 1 0 1 0

2 0 0 2 0 0

3 0 1 0 3 0

4 0 0 0 0 0


While doing this, we keep track of the longest substring found each time and of the index where the longest substring ends.

After we scanned the two strings, our answer is the substring starting at end - length + 1 (to include first character) and ending at end + 1 (to include last character).

You can check my implementation of longestCommonSubstring on my Gist along with some tests in LongestCommonSubstringJTests.

Spoiler alert: using suffix trees and Ukkonen's algorithm, this can be solved in O(N+M) time instead.

16/06/2021

[Java] Count ways a message can be decoded and interpreted

Assume we take a plaintext string composed only of alphabetic characters and encode it substituting to each character a value: a = 1, ... , z = 26

Given an encoded string, determine in how many different ways can the message be interpreted. For example

10: only way is digit 10

11: can be digit 11 or digits 1, 1

This can either be done online (reading string left to right) or from a static string (right to left).

The idea is we walk the string and verify at each step that we have a valid sequence, if not we raise error. Invalid sequences are:

  • any non digit
  • 00
  • x0: with x > 2

For each character we look at the following one to determine if there is an increment in the number of ways we can decode the string, given two digits we have the following valid combinations:

  • 10: only one way, the number 10
  • 1x: with x > 0 and x <= 9, two ways, single digits (1 and x) or number (1x)
  • 20: only one way, the number 20
  • 2x: with x > 0 and x <= 6, two ways, single digits (2 and x) or number (2x)
  • xy: with x > 2 and y > 0 and y <= 9, one way, the single digits (x and y)

CAUTION: anytime we encounter a 10 or 20, we must track it as the PREVIOUS digit CANNOT use the 1 or 2 to count as double decoding since we must use that in pair with the previous zero. Example:

  • 110 - only 1 way: 1, 10
  • 1620 - two ways: 16, 20 - 1, 6, 20

Therefore we track finding a zero and lock the previous digit to it, we remove the flag only after we successfully process the previous digit (unless it's again a 0 or 10 or 20)

If we represent this as a graph, each "two ways" possibility represents a branching path, with each subsequent digit updating ALL branching paths, the number of leaves in the end is our result.
This updating ALL branching paths is expensive, in the order of N

[Java] Maximum sum of non adjacent numbers in array

Given an array of integers, containing zeros, negative values and duplicates, find the maximum sum that can be generated by picking non adjacent elements.

Since it's a sum, we need at least two operands, therefore we need at least 3 elements in our array.

If our array was made only of negative numbers, then the problem becomes finding the couple of non adjacent biggest negative numbers, that is ONLY two numbers. Therefore in that case, the logic is incompatible with the one to solve for positive numbers as we want to add MULTIPLE numbers in that case.

We will therefore treat negative elements in our array as value 0 to make them not an interesting pick for our logic and therefore skip over them. The problem definition itself suggests how to solve it, given the triple:

(1) sum of previous elements excluding last: this sum excluded element in position current - 1

(2) sum of previous elements including last: this sum included element in position current - 1

(3) current element

should we choose (1) + (3) or (2)?

And we repeat this question for each element in our input. As for the base case, we've already seen that we need at least 3 elements to produce a valid answer, which in case of exactly 3 elements is simply first + last, no matter which value was in second place.

So our logic makes sense starting from element in position 2. What values we set for elements in position 0 and 1?

position 0: would be the choice for elements coming at a non existing index -1 and -2, we set it to first element in our array

position 1: would be the choice of keeping element in position 0 or adding ourselves to element in non existing index -1, we therefore pick the maximum between ourselves and the element in position 0

At the end, we simply look at the two sums and pick the biggest.

You can check my implementation of maxSumNonAdjacentElements on my Gist along with some tests in MaxSumNonAdjacentElementsJTests.

10/06/2021

[Java] Manacher's algorithm for longest palindromic substring

Last time we saw a quite simple O(N^2) solution to find the longest palindromic substring using constant space.

It turns out there is an efficient O(N) solution that uses O(N) additional space, proposed by Glenn Manacher.

I suggest watching this video to better understand how does the algorithm work and see a compact solution that works well for both even and odd length palindromes.

09/06/2021

[Java] Longest palindromic substring

Here's another string problem: given a string, find its longest palindromic substring.

The first character in the string, is always the center of a palindrome of length 1, therefore we can start investigating from the second position (if it exists).

For any string length > 1, our palindromic substring could be centered on any odd position (any character in the string) or even position (between any two characters in the string).

We can therefore for each position search in the string for the longest palindromic string centered there. We do this in O(N^2) time and constant space by looping over each character in the string and executing two separate loops expanding outwards from the current position.

When searching for odd length palindromes, we start with both indices for the inner loop on the current character, and for even length palindromes we set the start index on the character BEFORE us.

The inner loop breaks when the characters pointed at by the two indices are different, since we are NOT nesting these loops but execute them in a sequence their cost is O(N+N) which is O(N). The outer loop is O(N), therefore the total cost is O(N ^2)

You can check my implementation of longestPalindromicSubstring on my Gist along with some tests in LongestPalindromicSubstringJTests.

08/06/2021

[Java] Test whether two numbers are at Gray distance

The binary representation of two numbers respects the Gray code if the numbers differ by only one bit.

Spoiler alert, our magic formula is part of one possible solution :)

Let's see some examples

1 = 01

2 = 10

They are not at Gray distance since they differ in both bits.

1 = 01

3 = 11

They are at Gray distance since they only differ in one bit. 

We could loop over all bits in both numbers and check how many differences we find but that takes time proportional to the bits in the number (arguably with the big-O tricks since the number of bits is always limited to a fix amount we could discuss whether it's constant time).

However we remember our magic formula from last time to get the lowest set bit in a number and want to use it again. Our approach could be:

x = XOR the numbers to get only the different bits. If they are Gray, there must be only one different bit.

y = getLowestSetBit(x): to get the lowest set bit in x with our magic formula

x - y == 0: if there was only one set bit x and y must be equal therefore the difference must be zero

 

When looking at the video that proposed the exercise, they present another valid solution:

x = XOR the numbers to get only the different bits. If they are Gray, there must be only one different bit.

y = x - 1: if there was only one set bit now we removed it from that place and therefore x and y don't align anymore on that bit

x & y == 0: if there was only one set bit, x and y don't align therefore the bitwise AND will give all zeros


One thing to note in both cases, is that we are talking about unsigned sequences of bits, but Java has no unsigned type, therefore in some cases (two numbers are equal or negative) we wouldn't get the correct result as going into negative territory changes the whole number in Java and not only the sign bit.

[Java] Longest substring without repeated characters

Given a string s, find the longest substring without repeated characters in it.

It is immediately evident that by using some O(M) extra space (M is characters in alphabet) and a sliding window approach we might be able to find our solution in O(N) time (N is length of string). We just need to determine how to correctly shrink our window when a repetition is found.

[Java] Find first missing positive integer in array with duplicates, zeros and negative values

Here is a problem: given an array of integer which can contain duplicates, zeros and negatives, find the first missing positive integer.

If the array does not contain a positive integer at all, the answer is 1, if it contains all integers between 1..N, where N is the length of the array, the answer is N+1.

The requirement is to do this in linear time AND constant space. The input array can be modified.

I think the difficulty of this problem comes from the fact that we allow duplicates and do not want to use additional space.

(1) My first thought was to follow a logic similar to finding a repeated element using Brent's algorithm, Floyd's algorithm and binary search. But we can't apply those since we can have negative values, missing items and multiple duplicates.

If we wouldn't allow duplicates, this would be a very simple sum of all items excluding zeros and negatives, then comparing the sum with the expected sum of all numbers 1..N where N is the amount of positives we encountered which is quickly done with the formula: 

N(N+1)/2

Our result would then be the difference of the two.

Using extra space we can also solve this easily by creating a set of seen numbers, then doing a loop 1..N and testing whether the given element was seen. Our answer is the first unseen number.

Another solution would be if we could sort the array in linear time, which in some cases, for example radix sort or counting sort, can be achieved, albeit with some limitations: the range of allowed numbers must be known AND small enough, additionally, we require extra space.

[Java] Partition array moving all zero and negative elements at the end

We've already seen a possible solution for the Dutch flag problem where we partition an array given a pivot, so that all elements smaller than the pivot appear first, then all elements equal to it, then all elements bigger.

In this variant, we do not care about smaller, equal, bigger so we can use simpler logic to move all zero and negative elements at the end of an array.

Using two converging pointers starting at each end of the array, we can test pair of elements and swap if necessary. When the pointers meet, we are done and can also return the index marking the end of the positive portion of the array.

You can check my implementation of partitionArrayZerosAndNegativesLast on my Gist along with some tests in PartitionArrayZerosAndNegativesLastJTests.

07/06/2021

[Java] Find integer subarray which sums up to zero

I was watching this video regarding the following problem: given an integer array, find a subarray that sums up to 0.

In my opinion the whole array is a valid subarray, however, a single element isn't, since the problem states we need to perform a sum operation which requires at least two elements.

I decided to implement against that requirement, which slightly modifies the approach.

The overall idea is the same, we walk the array and keep summing up elements and track which sums we have seen so far. If there exists a portion of the array that sums up to 0, its contribution to the overall sum will be 0 and therefore we will find a duplicate result within our sums set. We could also simply have a sequence that sums to 0 directly, eg [-1,1] which is also a valid result.

In this variant, if we encounter a zero, we simply skip it but if we have two consecutive zeros, then we can return the [0,0] subarray which is valid as it includes at least two operands.

Additionally, we do not track the indices for each encountered sum, instead, as soon as we find a duplicate result or the current sum is 0, we stop walking the array and go backwards decrementing our current sum until we subtract enough from it to match our target (the duplicate result we're looking for).

All indices between where we stopped and where we arrived now are our subarray of sum 0. This subarray will be the smallest possible subarray to sum up to 0, therefore won't contain leading or trailing zeros, eg: [0, -1, 1] input will return [-1, 1] result.

This algorithm runs in O(N) time and O(N) space since we need an additional set to store the seen sums.

You can check my implementation of findSubarrayOfSumZero on my Gist along with some tests in FindSubarrayOfSumZeroJTests.

[Java] Threesums, find all triples from given set of numbers that sum up to a given target

Finally a clickbait-worthy exercise :)

Given a set of integers and a target K, find all distinct triples that sum up to K.

Two triples are distinct if their content is different, order does not matter, eg: 1, 2, 2 and 2, 2, 1 are the same triple.

A first approach might be to explore all possible combinations and keep creating tuples then triples and verify which ones respect the a+b+c = K condition.

We can use a queue and add elements to it until we have formed a triple, each time expanding the given input by one valid element only (therefore ignoring elements we already used).

This can also be done with a triple nested loop, but the complexity is in the O(N^3) time.

[Java] Circular right and left bit shifts

When I touch bit problems I always remember why we introduced so many abstraction layers between us and them.

The problem we have is as follows, given a number perform a circular shift of all its bits right by a given amount.

Example: 110 with a circular right shift of 2 becomes 101

Provide also a similar method that does a left shift, example 110 with a circular left shift of 2 becomes 011

Now in a perfect world, we can notice that the result is simply the bit by bit combination of both right AND left shift operations on the number. In our example 110 with a right shift of 2:

  • 110 with a simple right shift of 2 becomes: 001
  • what we dropped off the right would need to go the left of our result, we can get it with a left shift of (length - shift) positions: 100
  • combine the two bit by bit and we get the result: 101

Similarly we can do the same for left shift.

All good, however, due to how things are represented in Java we run into some issues:

  • right shift operator >> preserves sign, we need to use bitwise operator instead >>> 
  • for left shift instead the logic is the same, therefore there is only a << operator
  • integers are 32 bits, which is 4 bytes, when we define which input in bits we are handling, we need to convert it to the integer representing that 32 bit sequence, for example 10000000000000000000000000000000 (1 followed by 31 zeros) is 2147483648 which you will recognize as being just after the maximum allowed integer, therefore NOT an acceptable integer, however a perfectly fine 32 bit sequence. In other words, considering signed integers, that sequence would be -0 which does not exist
  • negative numbers are represented using two's complement, therefore -3 is represented not as 1_then_29_zeros_11 but start_30_ones_01 instead which breaks our logic since the bitwise shift operators would move bits set to 1 which we instead assume are set to 0 instead when we work on negative numbers

It might be therefore worth it to work with Byte objects rather than integers in order to use purely unsigned bit representations.

You can check my implementation of rightRotateShiftBits and leftRotateShiftBits on my Gist along with some tests in BitUtilsJTests.

[Java] Merge K sorted arrays

Given K sorted arrays of different lengths, merge them in a single sorted array.

If we had two arrays only, we could solve this in O(N+M) time by keeping two pointers, one for each array, and at each iteration we compare the current element for each to decide which one to add to the result set, then we advance only that pointer.

When we can't move anymore, if we still have elements left in one array, we add them all.

Expanding this approach to K > 2 yields a O(NxK^2) time unfortunately since we can't just as easily do the element comparison to find the smallest out of all arrays and we would therefore loop over K elements each time to find our pick.

We know sorting an array takes O(N log N) time, therefore we could simply append all arrays together and then sort them in O(NK log NK) time.

However our approach for K = 2 can be expanded a bit to achieve a slightly worse time than linear which is still better than this append and sort approach.

If we had an efficient way of tracking and retrieving the smallest item of a given set, we'd have our solution. This luckily comes in the form of a MinHeap.

We can therefore for each array, insert into the heap a Pair(value, Pair(index of this array in input, index of this value in this array)) and also provide a comparator that only looks at the values.

By initializing the heap with the very first element of each array, we are ready to start our logic. We keep polling the heap, which will return the smallest of all values and add that to our result (we also need to track how many items we have inserted so far in order to know exactly where to place the next one).

If the source array of that value has more elements, we add the next element to our heap.

When our heap is empty, we have processed all input.

This uses O(K) extra space but results in O(NxK log K) time since we only keep maximum K elements in the heap at each time.

If any input array is null or empty, we raise an exception. We could just as easily decide to skip it altogether instead. 

This has also the added benefit of allowing us to sort huge arrays as we can keep on disk the input and output and only need to keep in memory K items at a time.

You can check my implementation of mergeKSortedArrays on my Gist along with some tests in MergeKSortedArraysJTests.

06/06/2021

[Java] Serialize and deserialize a binary tree to a string

Given a binary tree, we need to produce a string representation of it which we can later use to reconstruct the same tree.

The difficulty here is in handling the information correctly, for example a tree with holes (eg not all levels are full) MUST communicate this in our serialize data, otherwise we can't rebuild it correctly.

Now we have seen in the past it's possible to reconstruct a binary tree from its inorder and postorder visits, therefore we could simply do those and serialize them in our string.

[Java] Check if two binary trees are equal

A simple problem necessary for another one I've been working on: given two binary trees, determine whether they are equal.

A O(N) solution time and space is simply to do a parallel BFS on both trees and break as soon as we find a non matching value.

You can check my implementation of areEqualBST on my Gist along with some tests in AreBSTEqualJTests.

[Java] Product of all elements in array excluding each position

Given an array of integers, return another array which contains in each position the result of the multiplication of all elements in the array, excluding the element in that position.

Example: [1, 2, 3] results in [6, 3, 2]

We have two variants, one allows zeros and the other doesn't. Additionally we have an optional challenge to try and solve it without using division.

We can see the O(N^2) solution with two indices works find and uses O(1) space.

However there must be some mathematical property or observation that we can use to reduce that time to O(N).

Variant 1: no zeros allowed and use division

We can notice that to get the answer for each position we need to multiply ALL elements in the array, then for each position divide this total for the element itself to exclude it. If no element is 0, we can do this in two passes.

Variant 2: zeros allowed and avoid division

Another thing we can notice is that for each position, the result is the multiplication of ALL elements on the LEFT and RIGHT of that particular position.

We can therefore use extra O(N) space and create two new arrays, leftToRight and rightToLeft which we fill scanning the input in the two directions and setting for each position the result of the multiplication of the element in the PREVIOUS (left to right) or NEXT (right to left) position in the input for the element in the PREVIOUS (left to right) or NEXT (right to left) position in the corresponding temporary array.

Since the first element in both cases has no predecessor or successor, we set it to 1 and start our loops from the next element.

Now we just need to combine these partial results by multiplying them together at each position.

You can find my implementation of prodAllExcludingElementAtINotAllowZero and prodAllExcludingElementAtIAllowZero on my Gist along with some tests in ProdAllExcludingElementAtIJTests.

05/06/2021

[Java] Find Kth most frequent elements in given set

This is an interesting problem: given a set of words, rank them by their frequency and return the k-th most frequent ones.

If K is bigger than the lowest rank, we return an empty set. Arguably, we could return the least frequent words instead. 

This question quickly becomes more philosophical since a "best" solution is really tied to what real use case you intend to solve:

  • are you going to parse a dataset once and execute multiple queries with different K values on it?
  • are you going to parse different datasets multiple times and execute few queries with different K values on it?
  • are you going to parse datasets so big the data won't reasonably fit in memory?

Here are some approaches.

Option 1: count and sort

We can see that by using extra O(N) space we are able to return a solution after processing the input a bit:

  • in O(N) time we can process the input words and build a map(word, frequency)
  • in O(N) time we can process that map a build another map (frequency, set of words)
  • in O(M log(M)) time we can process this second map to sort it by frequency (using a reverse natural ordering comparator)
  • in O(1) time we can return the desired result

Now this approach would yield O(N + M log(M)) worst case time and uses O(N) extra space.

An upside of this approach is that after this one time processing, we can query for any valid K.

 

Option 2: count and heap

  • in O(N) time process the words and build a map(word, frequency)
  • in O(N) time process that map and build another map(frequency, set of words)
  • in O(M log(K)) time we add all pairs(frequency, set of words) to a max heap of maximum size K using the frequency as the comparator
  • in O(log(K)) time we pop all values from the heap and get our result

Assuming K<<N we would have O(N + M log(K) + K log(K)) which is O(N + M) time and O(N) space.

An upside of this approach is the better theoretical speed but for a different value of K we need to do again O(M) work to recompute the new heap rankings, while with the previous approach we already get our solution in O(1) time.

Last note, if M<<N the theoretical time is O(N) in both cases.

You can check my implementation of option 1 in findKthMostFrequentWords on my Gist along with some tests in FindKthMostFrequentWordsJTests.

[Java] Test whether a tree is a valid BST

Given a binary tree without repeated elements, we want to test if that tree also a valid binary search tree.

A BST is a tree where each node must respect the property that its location with respect to its parent will be on the left side is the node's value is less than the parent's value, otherwise it will be on the right side.

This means that when inserting a node we move left or right until we find the correct location for a node and its location is constrained by values coming from previous levels, for example a node X can NEVER be in the RIGHT subtree of a node that has a higher value than X.

We can therefore use a recursive approach to visit the whole tree, setting the correct boundaries at each call. If a node is out of those boundaries, the tree is NOT a BST.

Another approach would be to perform an inorder visit of the BST, which should return the elements in their natural order IF the tree is a valid BST. Therefore we can scan the elements in this visit and test whether each one is smaller than the next.

Both approaches are O(N) time.

You can find my implementation of isValidBST and isValidBSTWithInorderVisit on my Gist along with some tests in BSTJTests.

[Java] MaxStack, a Stack that tracks the maximum element in it at any given time

This exercise is about defining a data structure MaxStack which behaves exactly like a Stack but offers a method max that can return at any given time which element is the biggest in the stack.

It is fairly quick to see we can create a class that wraps the Stack and has O(N) cost for the max function and O(1) cost for push and pop.

If we use additional structures to reduce the cost of max operation, we end up using additional space O(N), for example using a MaxHeap will provide the result of max in O(1) time but increases our push and pop cost to O(log N).

We can notice however that a Stack is simply a list where we always insert and remove from the head. Additionally, at each given level (node) the maximum in the stack can only be the maximum between the node value and all elements before it.

This means we can track two pieces of information: value and max and add a bit of logic to our push so that we always know at each level what is the maximum in the stack until that point. Specifically, each new node will compare its value with the maximum of the previous node and set its local maximum to the biggest of the two.

This has the additional benefit of using O(1) extra space AND keeps our operations O(1) time too.

You can find my implementation of MaxStack on my Gist along with some tests in MaxStackJTests. I use a helper MaxStackNode structure but could just as easily have used a List of Pair instead.

The class accepts a generic type as content, however the type must implement Comparable interface. It is also possible to explicitly provide a Comparator in input in case you require a specific ordering (eg: reverse of natural ordering).

Null values are always placed last in the calculation of the maximum.

[Java] Find couple of numbers that sum up to K

Here is a quick exercise: given a target number K and a set of numbers, find, if any, a pair of numbers that sum up to K.

We can do this in O(N) time and space by walking on the input set of numbers and recording somewhere all the complements we would need to find to sum up to K.

Example: target is 10 and input is 1, 5, 7, 3.

If we could sum up to 10 using 1, we'd need to have a 9 available. If we encounter a 9 while walking the array, we know we have the pair (1,9) and have one possible solution.

You can check my implementation of numbersAddUpToK on my Gist along with some tests in NumbersAddUpToKJTests. I make use of a Pair object as well.

[Java] Pair class of generic types

Something I've always found weird is that Java does not offer a Pair class representing a couple of objects. Sure there are many offerings around and technically a Pair class IS offered, but still uses the key,value naming which is strange. The closes standard offering we have is Map.Entry which still uses the key,value naming.

I've read some philosophical theories that you should not use a generic Pair since getFirst and getSecond are not meaningful names and should instead write your own pair classes with relevant names, however I cannot fully get behind this as for a similar reasoning also key,value is not relevant naming: is the key the ID of this value person, or is the key a character and the value the number of times we found it in a string?

Anyway, I needed to use pairs and decided to write my own class, the only challenge is how to correctly implement the equals method since we're handling generic types which also means our fields might be null.

04/06/2021

[Java] Calculate in how many ways can a target integer be reached using a given set of step increments

Very similar to finding the minimum amount of coins to make a certain amount, we have another problem: given a target number and a set of step increments, calculate in how many ways can we reach the target starting from 0. If the target cannot be reached, return -1.

As in the previous case, dynamic programming would help reduce the time needed to reach a solution, however our trusty BFS approach can also be applied here, we just need a couple tweaks in the logic.

Example: target is 4 and steps are 1, 2. Our graph now would be:

           0

       /      \

      1        2

     /  \      /  \

    2   3      3   4

   / \    \     \

  3  4     4     4

/

If you count the leaves, the result is 5 possible ways. Now notice that as long as we keep going down a specific path, we do NOT increase the number of ways we can get to our target, in fact, the full path IS one way of reaching the target, therefore when we deviate, eg when we have more than one link to follow moving forward, we increase the number of ways since we are making a new path.

Additionally, the faster routes (using bigger step increments) will reach a specific node faster than routes following smaller increments. This is again guaranteed by the nature of the BFS. We therefore know, that once we encounter an already discovered node, we were the last ones to get there. The number of ways to reach that node can therefore be updated to include the number of ways to reach us and we can stop exploring that route.

If instead we reach an undiscovered node, the number of ways to reach it, is exactly equal to the number of ways to reach us since we were the first to get there.

By tracking for each node the number of ways we can get there in a separate map, we can easily find the solution. If the map does NOT contain an entry for the target, it means it was impossible to reach given the provided steps.

We can again remove an entry from the map after we visited it, since we won't ever get back there again.

The complexity is O(NxM) and space is O(N).

You can find my implementation of howManyWaysforTarget on my Gist along with some tests in HowManyWaysForTargetJTests.

     

[Java] Calculate minimum number of coins necessary to make a given amount

This is a common problem, given a target amount N and a set of coins M with different denominations, what is the minimum number of coins necessary to make the desired amount, if at all possible.

Coins have all integer positive values. If amount cannot be generated, return -1.

The textbook approach would be dynamic programming where we find out which calculation to perform at each step and cache calculated values for future reference to reduce the computation time at the cost of space.

However, one can notice that this problem can also be tackled with a BFS graph search, where nodes are the possible amounts and edges are the coin denominations. New nodes are calculated by subtracting a coin from the current amount. 

If the result of a subtraction would yield a negative amount, we skip that path as there is no possible way to reach the desired amount with those choices.

Since BFS will ALWAYS encounter the node matching our target first, we are ensured by then we have our result, in this case the minimum number of coins.

For example: target is 10 and coins are 1, 3, 5. Our graph would look like this:

                                  10

              /                    |                  \

           9                     7                     5

      /   |   \               /   |   \            /   |    \

    8     6    4            6     4    2           4    2    0

  / |\    /|\  /|\ .....

.....

And so on. Notice that the first time we reach a node we followed the fastest way to get there, therefore when we would travel to an already discovered node, we can stop going down our path since our path would use more coins than the existing one and therefore won't lead to the solution.

Additionally, the first time we encounter a node with value 0, we know we have our answer. If we explore the graph without reaching a node with value 0, it means the given amount cannot be made with the given coins and therefore there is no answer.

Another noticeable thing, is that we use a map to retrieve data while exploring, however, there is no need to keep all the data in the map at all times, since after we explored a node, we won't ever go back there again, therefore a small space optimization can be to remove nodes after they are visited.

As last observation, the Map is not mandatory if one can track nodes in the form (amount, minCoinsSoFar) AND can use a Set then to store already visited nodes.

The overall runtime is O(NxM) and space is O(N).

You can find my implementation of minimumCoinsForChange on my Gist along with some test cases in MinimumCoinsForChangeJTests.

03/06/2021

[Java] MrJack text based videogame

MrJack is a nice boardgame for two players, an inspector and a thief, that take alternating actions trying to win within 8 turns. A full game takes 30 minutes to one hour usually.

The game features an hexagonal board with different cell types and 8 character tokens, each with a unique ability.

Players must carefully plan each move since luck is not the main factor in their success.

You can find FAQ, rules and reviews about it on boardgamegeek.

I propose a videogame implementation which is text based and uses ASCII art to represent the board, prompting players for actions via command line.

The first version 1.0.0 is fully playable and can be run as a Java application. You can find the full description as well as the source code in mrjack repository on my Github.

The code is released under a CreativeCommons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) licence.

I am NOT affiliated with HurricanGames and/or the creators of the board game and do NOT own copyrights to the game.