I spent part of my vacation last month working on my web site. One change I wanted to make was to put a banner on every page, and a directory-specific column on the left. Nothing fancy by web design standards, but an adventure for a hobbiest without a CMS.

The first time I had to make a number of pages with a common template was in 1995, and I wrote a Lisp program to generate them. The next time I made a web site, I used Barry Warsaw’s Python program to create some pages. (I still didn’t have access to a server with server-side includes.)

Last year when I moved my site to a modern server, I used PHP include to add navigation elements to some of the pages. But this time I didn’t relish editing a number of HTML files, and, more to the point, I was ready to try something new.

The Problem

Let’s say that software/index.html looks like this:

<h1>This is my software page</h1>
<p>Welcome to my software page</p>

Furthermore, let’s say that there’s a file categories.html in the same directory:

<ul class="categories">
  <li><a href="lisp">Lisp software</a></li>
  <li><a href="python">Python software</a></li>
</ul>

I would like a request for http://osteele.com/software/index.html to retrieve this content:

<html>
...
<ul class="categories">
  <li><a href="lisp">Lisp software</a></li>
  <li><a href="python">Python software</a></li>
</ul>
<h1>This is my software page</h1>
<p>Welcome to my software page</p>
...
</body>
</html>

where the ... and ... are generated by header.php and footer.php.

A conventional solution would be to modify index.html so that it included some header and footer material as well as the actual content:

<h1>This is my software page</h1>
<p>Welcome to my software page</p>

Of course, then I have to edit all my other files too, and any new files I add. And if I ever change the header and footer includes, I have to edit them all again. (For example, it’s likely that include('header.php') will become include($_ENV['DOCUMENT_ROOT'] . 'header.php') — now how many files will I need to touch?)

The include lines are domain-specific boilerplate. Like all boilerplate, they distract for the content of my source files, they add to the overhead of creating and maintaining the site, and they’re a chance for something to go wrong.

It seems backwards to edit the inside of a number of files just to control what gets shown on the outside (before the beginning and after the end). If each of the files is treated the same, something outside the file itself should do the wrapping.

Wrapping with RewriteRule

I used the Apache mod_rewrite module to solve this problem. I configured my .htaccess file so that a request for a URL with the path /index.html is rewritten into a request for the file /wrap.php. Within the body of wrap.php, the expression $_ENV['REQUEST_URI'] evalutes to the original path, in this case /index.html. The code in wrap.php can do whatever it wants with this pathname — show the original file, show just its name, or process it in some more sophisticated way.

My .htaccess file contains the following:

RewriteEngine On
RewriteBase /
RewriteRule ^.*.html$ /wrap.php [QSA]

and wrap.php looks like this:

wrap.php

<?php
  $file = $_ENV['DOCUMENT_ROOT'] . $_ENV['REQUEST_URI'];
  $sidebar = dirname($file) . '/sidebar.html';
  include('header.php');
  if (file_exists($sidebar)): include($sidebar); endif;
  include($file);
  include('footer.php');
?>

This strategy has several advantages over modifying every content file. The includes can be added or removed from any number of files without editing each file (just modify the regular expression in the RewriteRule); the includes don’t need to be copy/pasted to each existing file and each new file; and the includes can be maintained separately from the content.

AOP

The problem that motivated this — how to inject behavior into a set of source entities without modifying each one — is the same as one of the motivations for Aspect-Oriented Programming (AOP).

A “Hello World” of AOP is the problem of adding timing code to a set of methods. Adding timing code by hand to a single method is, like the header/footer problem above, a simpe matter of wrapping the method’s implementation at the source level. A method such as this:

void f() {
  do_something();
}

is replaced by this:

void f() {
  long startTime = System.currentTimeMillis();
  try {
    do_something();
  } finally {
    Logger.logTime(startTime, System.currentTimeMillis());
  }
}

AOP lets us leave the method body source unchanged, and place the timing code somewhere else (in a method on Aspect or Interceptor, depending on the AOP framework.) The timing code has a hole in it where the original method is spliced in — this is like the parameter to a function or macro. The location of this hole is signalled in the source by the proceed keyword in AspectJ, the JoinPoint.proceed() method in AspectWerkz, or the Invocation.invokeNext() method in JBossAOP.

Using AOP to insert the timing code has three advantages over modifying the method source: the timing code can be applied and removed without modifying the source; the same timing code can be applied to a number of methods without repeating it; and the timing code can be maintained separately from the methods. These are the same advantages of the RewriteRule+PHP solution to wrapper pages.

RewriteRule and AOP

In the language of AOP, the code that is injected into a program is advice. The location where it is injected is a join point. A set of locations is a pointcut; pointcuts are generally defined as a pattern or predicate that selects locations from a program.

In the timing example, the timing code (long time = ..., etc.) is the advice, the f method is the join point, and the pointcut is a specification such as call(void C.f(void)) (in AspectJ, and if f is a method on C).

In the web page wrapper, the advice is wrap.php, the pattern in the RewriteRule is the pointcut, and the join points are the URLs that that pattern matches. The statement include($_ENV['REQUEST_URI']) in wrap.php plays the same role as proceed or invokeNext() in Java AOP: it specifies where in the advice to insert the code at the join point.

Concept Java example Web example
advice time = ... include("header.php");
pointcut call(...) RewriteRule .*.html
join point f() /index.html
hole proceed include

Limitations

RewriteRule, not surprisingly, doesn’t do everything an AOP framework does. One way it falls short is that it only implements advice, not aspects. (An aspect is a set of advices, designed to be applied in concert.) It only implements around advice — although given around, it’s trivial to implement before and after by placing the include at the beginning or end.

Perhaps most seriously, it’s not possible to apply multiple advices to the same file. (Or at least, I haven’t figured out how.) In the example above, I listed the rewrite rule as RewriteRule ^.*.html$ /wrap.php [QSA], and used $_ENV['REQUEST_URI'] in wrap.php to retrieve the original URL. Since my actual .htaccess contains other rewrite rules too, I actually use RewriteRule (.*.html)$ /wrap.php?file=$1 [QSA] to rewrite the URL, and $_GET['file'] to retrieve the original URL. Either of these methods only works for one rewrite per URL, though.

This limitation means it’s not actually possible to factor a number of crosscutting concerns out of a page — just one. Certain other concerns can be factored out because RewriteRule and RewriteCond can check request headers and test and set variables, but in general you’re stuck with one general-purpose solution (aspects), and a few special-purpose solutions for whatever mod_rewrite handles. Of course, you can always use traditional OO techniques in the server-side code that you do weave together, to handle everything else in a non-AOP manner.

Summary

There’s something of a dancing bear quality to this: .htaccess is enough like assembly language that you can implement an aproximation of something high-level in it! (I remember being thrilled by Z-80 assembly language as a big step up from Dartmouth Basic, because now I could use subroutines for the first time.)

What I find interesting is that this wasn’t the motivation at all. I didn’t decide “let’s port AOP to .htaccess, just to see if it can be done”. Instead, I implemented wrapper pages the laziest way I could think of (where “lazy” means I didn’t have to do anything repetitive, now or in the future), and the result turned out to be closely analogous to AOP.

The stages of web site generation techniques that I gave at the beginning of this post — write a program to generate pages statically, use somebody else’s static page generation program, generate the pages dynamically, and finally weave the code that does the generation together dynamically — recapitulates the history of programming techniques in general. First there were a few domain-specific special-purpose code-generators, then compilers for general-purpose static languages, then dynamic languages caught on, and finally — as the binding time for everything, including program construction, gets later — AOP.

Web programming really is web programming.