Visualizing Regular Expressions
Here’s something I’ve wanted for a long time. So I finally built it. reAnimator is a tool for visualizing how regular expression engines use finite-state automata to match regular regular expression patterns against text.
This is intended to demonstrate the implementation of regular expressions. If you want to learn how to use them instead, I recommend these references instead:
A.M. Kuchling’s Regular Expression HOWTO
Steve Mansour A Tao of Regular Expressions
Jeffrey Friedl’s Mastering Regular Expressions (Amazon)
The Regular Expression Library’s list of resources
New My own reWork online regular expression workbench.
The User Interface
The screen has these areas:
The “pattern” shows the regular expression. Click on it to set another regular expression to match against.
The “input” is the string that is matched. As you type into the input string, the color of this string indicates whether it’s a complete match (green), a partial match (blue), or a non-match (red).
There are two graphs, which each display a finite-state automaton (FSA) that corresponds to the regular expression in the “pattern” area. As you type into the “input” area, the graphs also update, to display the state of the match.
A deterministic FSA (a DFA) is like a board game, with a counter that is moved according to the successive letters of the input string. The counter starts at the initial state (the leftmost circle with the arrow from off the board). Each consecutive letter of the input string tells where to move the counter to next. If the counter ends up in a terminal state (a double circle), there was a match.
A nondeterministic FSA is the same except that when there’s more than one legal move, you take them both.
The non-deterministic FSA bears the most direct resemblance to the regular expression. In exchange for this simplicity in its construction, it’s more complex to evaluate: instead of keeping track of just one counter, you have to keep track of a set of them. (This is an instance of the compile-time versus execution-time trade-offs that computer science is rife with.)
If you want to learn more about finite-state automata, the wikipedia entry has some useful information, but this also seems like a good place to plug my father-in-law’s book.
The front end is written in OpenLaszlo, and compiled to Flash. It uses AJAX and JSON to request the FSAs and the graphs.
CGI vs. FastCGI: I’ve been doing most of my server-side programming with either PHP or FastCGI and Ruby, so that pages don’t take so long to serve. This is the first Python service I’ve deployed since I started using FastCGI, and I was planning to use it.
But Python doesn't include FastCGI in its library, and the fact that there were four different third-party libraries with different APIs, none of them endorsed, and that the [latest PEP to mention FastCGI](http://www.python.org/peps/pep-0222.html) was deferred five years ago, made me unwilling to take on the project of evaluating them.I’m glad to hear that I was completely off base about Python and FastCGI. See Phillip J. Eby’s comment below. I stuck with CGI, which is in the standard library.
OpenLaszlo vs. DHTML: It would be just as easy to draw the graph itself using the canvas class in DHTML. I balked at doing the animation and user interface in DHTML, though. (There’s little touches like laying the graphs out horizontally only if there’s enough room, which were only a few lines of declarative code in OpenLaszlo.) And then it wouldn’t have worked on as many browsers. I decided to wait until OpenLaszlo compiles to DHTML, for a DHTML version of this.
PyFSA vs. …: Higher-performance (C) implementations of FSA minimization and determinization exist. I went with my own because it’s the only one I know of that has the option of preserving source location information across transformations. I use source location information minimally in the interface, and might add more.
Thanks to Margaret Minsky and Gary Drescher for commenting on a draft of the application. I used Patrick Logan’s json-py for server-side JSON. The credits for GraphViz are here. Thanks to my former colleagues at Laszlo Systems, Inc. for helping create the OpenLaszlo platform. I adapted Philip J. Scheider’s code for subdividing cubic beziers; this is a compact implementation of de Casteljau’s algorithm for Algol-like languages. Thanks to Guido and company for Python. Lastly, since people always ask, I drew the architecture diagram with Omnigraffle (gee I wish I got a commission!) — which I like because I’m not much of a designer, and diagrams I draw with it are passable without much work.
Update: I changed the name to reAnimator. (It was reMatch.) Thanks to Apache RewriteRule for letting me do this without breaking the old URLs!