Recursive Descent Parsing

Early in the development of computers, programming languages were simple, and were converted into a machine usable form using a parse tree. Let me give you a little context.

I’ll simplify this as much as possible. If you were to give this directly to a developer, she’d probably point out that it’s got a few technical flaws. Compiler design isn’t really my space, though, so I think it’s forgivable. There are details that I’m leaving out. (For instance, the 8*8, since it always ends up being the same result, would be turned into 64 before it was run, avoiding the time to do the multiplication, in a process called, unsurprisingly, ‘optimization’)

Suppose I have a javascript command which sets the variable i to hold the number 64 by multiplying 8 by 8. In javascript, that looks like this:

var i = 8 * 8;

The eights are constants, the *,;,=,and var, are commands, and the i is a symbol. The parser would run against it, and produce a tree that can be followed to figure out what things are related to what… in this case, the tree would be something like:

So, when this is done, the little box in memory represented by i now holds 64. This is generated by a recursive descent parser. The way it is processed, internally, varies from compiler to compiler. (As does how it’s represented as it’s compiled and parsed)

Basically, when the interpreter runs, it will follow the tree (because there’s stuff above the part of it I represented above) It sees the commands in a different order than you do, we think ‘in order’ 8 * 8. It thinks in pre-order or post-order, depending on how it is written. I’ll use post-order in this description.

The parser follows the tree and builds, internally, a new representation which is easier to compute. It represents putting each thing onto an in-memory structure called a ‘stack’.

A stack works like the stack of dishes at a resturant. As each piece of information is put onto it, the previous piece of information is moved one place down. In post-order, this can be very handy. Suppose we had an in-order formula like, 5+10*2+10. Let’s put some parenthesis around that, to make some things easier to see. Remember that multiplication always happens before addition. So the above becomes 5+(10*2)+10, simplifies to 5+20+10, simplifies to 25+10, simplifies to 35. Simple.

Computers can’t do in-order very easily. So a parser would convert that to a parse tree and then into a pre or post-order version. Continuing in post-order, it would become 5 10 2 * 10 + +. Numbers and Operations(+,*,-,etc) all have the same relative importance in this. The operations are commands which will affect the stack, and the numbers are values which get placed on it. So, starting with this second formula, lets follow what happens.

Just to throw a little complexity onto this, I’ll add two pieces of computer jargon. Putting something onto a stack is called ‘push’, and taking something off of a stack is called ‘pop’. Let’s see how the computer will push and pop things on the stack. Generally operations will cause pushes and pops. So…

stack what we are doing
The stack starts out completely empty in this case
5 We’ll push the 5 on, that’s the first number
5 10 Now push the 10 on, just another number
5 10 2 We’ve got a few numbers on there now, what’s next?

Now, the * operator pops two numbers off the stack, multiplies them, and pushes the result, so…

stack what we are doing
5 10 2 * Here it is before the * happens ( 10 * 2 )
5 20 And now, here’s the stack AFTER the *
5 20 10 We push the next number

The + operator is analogous to the * in that it takes two values off the stack and puts one back so…

stack what we are doing
5 20 10 + BEFORE ( 20 + 10 )
5 30 After the +
5 30 + Hey, another +, since there are two in the formula
35 And now, the stack has the answer at the top

Pretty cool, huh? So, how about our “i = 8 * 8”? Well, lets take that
parse tree and make it into post-order: 8 8 * i =

Now, we’ll chew it up with the stack?

stack what we are doing
8 8
8 8 * Multiplication takes two, puts the result out
64 i
64 i =

Hmm, equals… it pops two, and THEN, puts the second item into the memory space represented by the first. See, when the computer sees i it really sees a memory location REPRESENTED by i. The variable i is just a place in it’s memory that has the label of i. Whenever you do something with i, it goes and gets the value stored at the location that is REPRESENTED by i. So, it takes the 64, looks up what memory location is represented by i, and puts it into that space.

So, now, after chewing through that parse tree, converting it to post-fix, and executing it… i = 64

Seems like a lot of work? Well, it is, but the computer does it with blinding speed. Every operation the computer does, ultimately, can be represented by operations on a stack. There are other ways of doing it, but that’s the easiest one to explain.

Once you have the parse tree, you can add or remove things from it. For instance, suppose one had a horrible interpreter in their browser. All modern browsers have javascript interpreters in them. It’s so you can make pages that do things like change images when you hover your mouse over them or such. Now, suppose someone had a web page that used some horrendously complex operations and she wanted to simplify her code in advance.

Remember the i=8*8? Well, suppose she had a chunk of javascript that was full of those sorts of things. Operations that resulted in constants./p>

Using the parse tree, she could search for things that were constant * constant, and replace them with the result. Using the parse tree, she could replace all of the 8*8s with 64s, and such. Find sections of code that never run, and remove them. That sort of thing.

The parse tree makes it trivial to find and fix those sorts of naive programming problems. It can be quite handy. For most developers, it’s not needed, but those that do need it can make awesome use of it.

(I realize that recursive descent parsers are rare in modern computer science, having been largely replaced with shift-reduce parsers due to the complexity of modern languages.)

Related Posts:

This entry was posted in Computing History, Humour, Programming and tagged , , , . Bookmark the permalink.

2 Responses to Recursive Descent Parsing

  1. Rick Cuddy says:

    man, I’m so tired.. I read that as “Recursive Decent Parenting”

  2. I will freely admit that sometimes, parenting is recursive.