Pages

01/07/2021

[Java] Length of longest filesystem path

Wow this exercise was harder than expected, also yay for String methods.

Given a string representing a filesystem, find the length of the longest path to a file.

The string has the following format (without spaces):

root  \n \t dir \n \t \t file_in_dir.ext

which represents the path: root/dir/file_in_dir.ext that has a total length of 24 characters.

We are told filenames will always include a dot followed by an extension and directories will never contain a dot.

The problem is similar to parentheses matching one, where we keep adding elements to a stack until we encounter a specific element that makes up pop out. Specifically, we notice that we go deeper in the path until we find an element at same or higher level than the current one, in which case we move back to a higher folder in the path.

Assuming that the input is always well formed though, we have an easier time finding, writing and testing a solution.

We will also be taking advantage of lastIndexOf and split which both work in O(N) time where N is the length of the string. This is another thing I noticed, where avoiding usage of those methods is possible and would result in better overall runtime (although in same upper bound), but would make the code messier to write and debug on a whiteboard. But if we use those methods, we better make sure we understand exactly how they work first ;)


We use a stack and keep adding lengths of portions of path to it where the last (lowest) level is always on top.
For each item we parse, we calculate its length and (if it is a filename) update a max variable appropriately.
The length is length of last level before us (top of stack) + our length, we also need to convert each level to a / so subtract level + 1, example (without spaces):

"root \n \t dir \n \t \t file_in_dir.ext" is "root / dir / file_in_dir.ext" where we ignore "\n" and each sequence of "\t" became a single "/"

length - level + 1 is used to add a single "/" in place of each sequence of "\t" since we use lastIndexOf method which returns the index BEFORE the last match for the given pattern, or -1 if it is not found

Each time we know at what level we are by counting the number of starting "\t" we find.


When our stack has more items the the current level, it means we navigated up from the last position, so we pop elements until the stack contains exactly as many items as our current level.
CAUTION: since root is level 0 and stack with one item has size 1, we compare with level + 1

Example (without spaces):


root \n \t dir \n \t \t file_in_dir.ext \n \t \t other_file.ext

"root" is level 0, and length 4, push 4 to stack
"dir" is level 1, and length 3, giving a total path of 7 (last level was 4), push 7 to stack
"file_in_dir.ext" is level 2 and length 15, giving a total of 22 (last level was 7), push 22 to stack
Additionally, since "file_in_dir.ext" is a filename (it contains a dot) update max to 22
"other_file.ext" is level 2 but stack has now 3 elements, pop one off to reach our level. We are length 14, giving a total of 21, push 21 to stack.
This is also a filename, but it's not longer than the max, leave it as is

At the end we will have our max in our variable.

Total cost is:

O(N) to split the string in pieces plus: for each piece of length L,  we scan it once to find the last index of "\t" in O(L) time, then we might perform at most O(number of pieces in the stack) pop operations. This does NOT always happen as we only do so when we are moving to a different level, which means for some strings we will NOT pop.

We scan again the piece in O(L) time to verify whether it contains a dot.

In total we process the piece in worst case O(L + K) time where K is the size of the stack. This operation is however balanced by those pieces where we do not pop, but only push.

For the total string length N, we therefore parse it 3 times (split, then for each piece lastIndexOf and  contains) and perform at most O(N) pops from the stack, therefore our final runtime is O(N).

Regarding space, the most we use is a full stck where we never popped pieces off, which will be O(N) as well.

You can check my implementation of longestFilesystemPath on my Gist along with some tests in LongestFilesystemPathJTests.

No comments:

Post a Comment

With great power comes great responsibility