Sunday, August 22, 2010

What in the World is a Closure

Browsing programming blogs and sites like reddit's programming section, I keep coming across a CS term that my schooling missed out on: closure.  It's one of the touted features of Javascript.  (Which, as an aside, remains too low on the priority list for me to make much headway with.)  Folks seem to really like 'em, so in the interest of continuing education, I decided to see what in the world a closure could be.  Naturally, I visited Wikipedia first and got a browser full of code examples and this definition: "... a closure is a first-class function with free variables that are bound in the lexical environment."  Well, that's quite a mouthful.  Like any good programmer, when faced with a daunting puzzle, let's break it down into smaller parts.

First-class function is an easy one, it just means that functions are not just syntax but a true data type within the language.  If the language allows us to create functions, assign them, pass them, and otherwise treat them like any other data type available, then it supports first-class functions.  C doesn't, but it simulates some of the capabilities through the function pointer mechanism.  Of course Lisp, and other functional languages, are built on the foundation of first-class functions.  So a closure is a first-class function, but that's not all.

Free variables, again according to Wikipedia, are variables referred to in a subroutine that are neither local variables nor parameters to the subroutine.  This definition becomes a bit clearer when you think back to CS class and remember that variables become bound when they take on a value.  In other words, a free variable in a subroutine is one that is not bound within the scope of that subroutine.  The trivial example would be a function using a global variable; the global is a free variable in the scope of the function.

That bit at the end of the definition about "lexical environment" is another reference to variable scoping.  Basically, it's saying that the free variable in the subroutine has been bound by the syntax of where the subroutine is defined.  In our trivial global variable example, the global is bound in a scope that encompasses the function, the global one.  Assume for a moment that our example function was assigned to an object, ready to be passed around and you will see that it meets the requirements for a closure: it is a first-class function (or we couldn't assign it to a variable and pass it around), it has a free variable (the global), and that free variable is bound in the lexical scope where the function was defined.

But wait, a closure isn't necessary for this example, because the global variable, by definition, is visible everywhere.  To make a more interesting example, picture a module (or package, depending on your preferred nomenclature) that defines a variable which is private within its namespace.  It also defines a procedure that uses the variable.  And finally, it exposes that procedure as an object through its publicly available interface.  When the function gets called outside of the module's scope, it still references the module's private variable, even though that variable isn't visible in the scope where the function is used.  So our function object is carrying with it a piece of state, not from the scope where it is being executed, but from the scope where it was defined.  This somewhat mind bending ability defines a closure.

For further reading, see the Wikipedia pages above and this post about closures from Martin Fowler.

No comments: