What is Parsing?
When you read an expression, like
1/2+3.4, you can immediately understand some of its meaning. You recognize that there are three numbers, and that the numbers are combined with operators. You may also recall that division has higher precedence than addition, so you would divide
1/2 before adding
+3.4 when evaluating the expression.
Compare this to how you perceive
2H3SGKHJD. At first glance it appears to be a nonsense sequence of characters. If I told you that letters should be grouped in pairs with
G being a separator, your mental model might look closer to
2H 3S ; KH JD, which takes us a step towards understanding that this string represents hands in a card game.
Parsing is the process of taking a string of characters and converting them into an Abstract Syntax Tree (or, AST). This is a representation of the structure of the expression, for example:
OperatorNode( operator: "+", left: OperatorNode( operator: "/", left: 1, right: 2 ), right: 3.4 )
or, as a diagram,
"+" / \ "/" 3.4 / \ 1 2
Such a tree is a first step towards computing the value of the expression, or rendering it beautifully.
So, how does one create an AST? At Desmos we use the approach described by Vaughan Pratt. This article will begin with what is hopefully a clear and concise explanation of how Pratt Parsing works. We will then explain our motivations for adopting this technique at Desmos and compare it to the jison parser generator, our previous approach.
Finally, we provide a sample implementation of the parser (and a lexer) in Typescript, integrated with CodeMirror. We hope this will be a useful reference and starting point for anyone interested in doing parsing in the browser.
How does it work?
Our parse function will operate over a
tokens object. This is a sequence of tokens, like
[1, "/", 2, "+", 3.4] that is generated from our input through a process called lexing. We will not go into the details of lexing here, other than to point you at our sample implementation.
tokens object is a token stream, which allows us to
consume a token, returning the next token and advancing the stream. We can also peek a token, which gives us the next token without advancing the stream.
[1, "/", 2, "+", 3.4].consume() -> 1, ["/", 2, "+", 3.4] [1, "/", 2, "+", 3.4].peek() -> 1, [1, "/", 2, "+", 3.4] .consume() -> empty token,  .peek() -> empty token, 
Let’s start with a recursive call and fill things out as we go along. We will present our approach in pseudocode, but you are welcome to reference the Typescript implementation as we go along.
function parse(tokens): firstToken = tokens.consume() if firstToken is a number leftNode = NumberNode(firstToken) otherwise Error("Expected a number") nextToken = tokens.consume() if nextToken is an operator token rightNode = parse(tokens) return OperatorNode( operator: nextToken, leftNode, rightNode ) otherwise return leftNode
We expect a number token followed by an optional operator. We then perform a recursive call to find the sub-expression to the right. So far so good – we start getting an idea of how parsing an expression like
3 * 2 + 1 might work:
parse [3, "*", 2, "+", 1] 1 consume 3 as a NumberNode into leftNode 1 consume "*" to be nextToken 1 rightNode = parse [2, "+", 1] 2 consume 2 as a NumberNode into leftNode 2 consume "+" to be nextToken 2 rightNode = parse  3 consume 1 as a NumberNode into leftNode 3 no next token 3 return: 1 2 combine 2, "+" and 1 into an OperatorNode 2 return: "+" / \ 2 1 1 combine 3, "*" and rightNode into an OperatorNode 1 return: "*" / \ 3 "+" / \ 2 1