Ok, this is a very specific exercise and a bit hard to summarize in a short title, but in the end it is just an algorithm working on lists, more specifically list of lists.
The problem is: given a list of ordered lists as input, where the content is an object (date, value), generate as output a list, ordered by date, where for each date, the value is the sum of the values for that date in each list. If a list does not contain that date, use the previous value, if any. The date can be represented as a long to ease the comparison logic, otherwise we would need to provide a comparator.
For example consider this representation
5/5 5/6 6/11 6/12
A 10 20
B 20 10 20
C 10 20
The input would be given as a list of lists:
[
[(5/5, 10), (6/11, 20)]
,[(5/5, 20), (5/6, 10), (6/12, 20)]
,[(5/6, 10), (6/12, 20)]
]
The result has to be: [(5/5, 30), (5/6, 30), (6/11, 40), (6/12, 60)]
Because we track the following phantom values, the "holes":
5/5 5/6 6/11 6/12
A 10 [10] 20 [20]
B 20 10 [10] 20
C [0] 10 [10] 20
So, easy solution would be to first find all the unique dates in O(MxN) time where M is the number of lists and N the elements per list, then build a MxN matrix filling the holes when we don't have a value, then sum up each column and produce the result. This solution works in O(MxN) time and space.
Can we do better? Timewise, no, since we need to scan all the input values anyway. Spacewise, definitely, we did not use a matrix in the first place because we don't want too much space being taken up!
An idea that uses O(1) additional space but runs in O((MxN)xN) time: build the result list while walking through each input list. This means that first we collect all the dates in O(MxN) time, then, for each date we collected O(N), we scan the input lists multiple times in case we have holes O(MxN). We can do this scanning using two pointers for example, one moving one step farther than the other, in order to find the hole boundaries in our list and get the correct value.
A faster solution that uses O(M) additional space and runs in O(MxN) time instead is to first collect all the dates in O(MxN) time, then create an array of pointers to each input list and for each date we found O(N), scan the array O(M) and do one of the following actions:
- if the date of current element is the one we are considering, update the value in the result but do NOT move to the next element yet, because we might have a hole after us
- if the date of the current element is preceding the one we are considering, check if we have a following node:
-- if yes, is its date bigger than the one we are considering? If so, we are in the hole, so use the current value and do NOT move forward. If not, it must be equal to the one we are considering, meaning we filled the hole, so FIRST move forward and THEN update the value. We can't have a situation where the next date is smaller, since the lists are ordered
-- if no, we are the last element in the current list and are followed only by holes, keep using the same value and never move up
By the way, the idea of using a TreeMap/TreeSet to track the result and/or collecting the dates initially was discarded because of the O(logN) insert and update time complexity. At first it sounded like a good idea due to the easiness in tracking in an ordered way all the dates while scanning the lists, but since afterwards the logic with the pointer array would need to be applied there as well (we still need to sum up the values AND consider the holes), I think it's better to settle for a simple list instead (which by the way is also the output :) ).
You can check the implementation of the mergeDateLists method on my Gist along with the definition of the DateNode class and some test cases.
No comments:
Post a Comment
With great power comes great responsibility