07/11/2017

[Java] Convert expression to reverse polish notation

The closing chapter of my RPN journey, how to convert an infix notation expression to a postfix notation one, so from 3 + 4 we get 3 4 +

Here as well it appears immediately that a String is not the most comfortable input structure, so we stick to an array.
First idea was to scan the input and tokenize it in a left side portion and right side portion for each operator we found:

3 - 4 x 5

would become first:

3, 4 x 5, -

Then finally:

3, 4, 5, x, -


This would also work with parentheses but it poses a challenge on how to recognize the portion to process, meaning we would work in square time if we need to scan the array every time in order to find the ending portion of the subexpression we are evaluating. Maybe this drawback could be eased by keeping a pointer that delimits the search area and decreases its size with every search, but it already looks a bit complicated.

Second idea building on that one, is to create a structure to help with this kind of processing, such as a tree, then to get result we would do a postorder traversal. This idea sounds already better but with a small caveat in that building the tree correctly is not very easy to handle; we have to handle many subtrees which - if the expression is well formed - would be merged into a single one in the end. By the way, this approach is useful when parsing more complex expressions or languages and generates a tree known as abstract syntax tree.

Now, can we do better than that for this specific problem? Well, yes. Dijkstra - this guy was a genius - comes to rescue with his shunting yard algorithm which gives a simple process to parse the input expression and generate the RPN version in O(N) time and space. Given that's his algorithm, I will not repeat the description and you can find my implementation for convertToRPN on my Gist along some test cases in RPNJTests.

I just have a small note on the algorithm as described on the Wikipedia page; the pseudocode says:

    if the token is an operator, then:
        while there is an operator at the top of the operator stack with
            greater than or equal precedence and the operator is left associative:
                pop operators from the operator stack, onto the output queue.
        push the read operator onto the operator stack.


Pay attention to the marked part, it implicitly means that encountering a parenthesis would break the loop :)
This expression would otherwise fail:

(3 - 4) x 5

Since when we encounter '-' we have no idea how to compare it to '(' which is at the top of the stack


No comments:

Post a Comment

With great power comes great responsibility