As an intern here at Synapse Studios, I used to be naive about some things. [*Some* things? *Used* to be?—Ed] I’d visit websites, look at their URLs and wonder about the time and planning they must have taken in creating so many directories. Take this news article from Yahoo for example: http://news.yahoo.com/s/ap/20080715/ap_on_bi_ge/oil_prices

Does news.yahoo.com really have the directory structure reflected by this URL? Of course not. That would create an astronomical amount of directories in a website with thousands of index.html files. (And wouldn’t make any sense to begin with; that’s what query strings and dynamic pages are for.) But there’s a way to create legible, sensical URLs that mirror a directory structure but really include variables to a dynamic page. It’s called a rewrite engine. Through the use of a rewrite engine, we can create URLs that make sense to people AND that are attractive to search engines. Take a look at how, after the jump.

Apache includes a rewrite engine called mod_rewrite. Since we use Apache as our web server of choice, we’ll be covering how to configure mod_rewrite.

Mod_rewrite is an interesting beast. Some call it “magic” and some call it a “nightmare”. Really, it’s a bit of a magical nightmare that, when skillfully and practically employed becomes a powerful tool. It’s not a cure-all for every problem with URLs, but it’s great for doing many things. You can use mod_rewrite to:

  • Prevent hotlinking
  • Create pretty and SEO friendly URLs, eschewing the use of illegible query strings
  • Create redirect pages

Redirection is handy because you can move pages if need be, maintaining the old URL while serving a page from the new address. This removes the need for a redirection server-side script, client-side script or meta-refresh and keeps your search engine rankings from taking a hit.

However, in this tutorial I want to focus on how to use mod_rewrite to replace illegible query strings. This tutorial is not comprehensive, but it should provide you with a place to start from and teach you how to do basic rewrites.

Our Goal

Let’s say you have written a PHP script that lists the top 10 restaurants of any city in any state. You are very happy with your excellent script and you have spent several years traveling across the United States on unicycle to personally collect this data. Let’s say your website is www.best-food-of-the-usa.com. Well, the URL to look at the best restaurants in Mesa, Arizona, is pretty ugly looking:

http://www.best-food-of-the-usa.com/index.php?operation=top&state=arizona&
city=mesa&limit=10

Blah! Let’s see what we can do with mod_rewrite to make that look nicer, more appealing and comprehensible to the user, and more search engine friendly. (As search engines apply rank based on the file name and path, putting relevant information in the core path instead of the query string can help boost your ranks considerably.)

Enable mod_rewrite

The first step to using mod-rewrite on your website is to make sure it’s installed and enabled. If you have a system administrator, they should be able to do this for you. If you don’t, you’ll need access to your httpd.conf and you’ll want to uncomment the line that reads:

#LoadModule rewrite_module modules/mod_rewrite.so

by removing the #. (Or add the line it if it doesn’t exist.) You’ll need to restart Apache for this change to take effect.

Create an .htaccess File

Create a plaintext file and add the following two lines:

Options +FollowSymLinks
RewriteEngine On

These lines ensure that mod_rewrite is enabled and active. Save this file as .htaccess. Make sure that you do not accidentally name it .htaccess.txt. (If you’re creating the file in notepad, make sure you select “All Files” from the file type dropdown when saving it. If you’re creating the file in vi on the server, you should already know what’s up, and it won’t try to add an extension automagically.) Put this file in the directory with your script.

Rewrite Rules & Regular Expressions

Now we are ready to start writing some basic rewrite rules. Each rule follows this form:

RewriteRule PATTERN DESTINATION [FLAGS]

The PATTERN is what we are looking for and the DESTINATION is what we are going to rewrite to. FLAGS are options you can set for each rewrite… optionally. FLAGS are beyond the scope of this tutorial and will have to wait for the next.

Here’s a small example:

RewriteRule ^home\.html$ index.html [R,NC,L]

This rewrite rule redirects visitors to home.html to the index.html page. Mod_rewrite makes use of regular expressions (regexes, for short) to parse your PATTERN element. If you’re not familiar with regex, they’re extremely powerful expressions that make use of special characters to denote pattern-types to search for, like ranges, white space, numbers and more. They’re a bit difficult to wrap your head around at first, so let’s look at this example to start.

The PATTERN portion of the rewrite rule typically begins with a carat, ^ and ends with a dollar sign, $. This syntax establishes the beginning and end of the pattern to match. These two symbols are optional, and including or excluding them can have different effects. If only carat (^) is used at the beginning, our PATTERN must appear at the beginning of the URL. If only the $ is used, our PATTERN must be at the end of the URL. If both are used, our PATTERN can appear anywhere inside the URL. To make our PATTERN exactly what we want, we are going to remove the carat so that the PATTERN is matched only at the end of the URL. This will be the case in rest of this tutorial.

Next we have the period, (.). (In case you didn’t know what symbol I was referring to there…) The period is used in regular expression syntax to indicate “any single character.” Since that’s not what you mean in the example home.html, you need to tell the parser to ignore it. You do this with a backslash, \, directly in front of EACH character you need to escape. Because your DESTINATION doesn’t go through a regex parser, you don’t need to deal with that on the destination side of things.

If you find your PATTERN breaking, you’re probably going to want to give it a once-over and escape any non-alphanumeric characters with a backslash. Chief among these, ^ $ [ ] ( ) { } ! \ .. (That’s right—if you want a literal backslash in your PATTERN, you need to escape it… with a backslash.

Our first rewrite rule example wasn’t too helpful; it redirected users but didn’t rewrite a query string. What we need are rules that can apply to variable incoming requests. Let’s suppose we take any page and redirect it to index.html. This is a useful way to create a front controller pattern. We would write the rule like this:

RewriteRule (.+)\.html$ index.html

You use parentheses to group things in your regular expressions. In this case we have created a group containing “.+“. The period is interpreted as “any character” and the plus means “1 or more times”. So all together, our PATTERN is (any character one or more times).html. That matches any page with an html extension. If we wanted to include htm extensions, we would change our rule thusly:

RewriteRule (.+)\.(html|htm)$ index.html

The pipe character ( | ) is used to represent a logical OR. That means our pattern can end in “html” OR “htm”.

Range brackets are also very useful. Let’s say we only wanted to accept pages that had alphabetical characters in their URL only. Then we would write our rule as follows:

RewriteRule ([a-zA-Z]+)\.html$ index.html

The range is shown by putting values within square brackets. You can choose specific values like [abc] (only allows a, b, or c) or you can do ranges like [a-g], [0-9] or [n-y]. Now that we’ve covered some regex basics, we can build a pattern that can accept variables within a “directory structured” URL.

Pretty URLs

Our example URL from earlier looks like this:

http://www.best-food-of-the-usa.com/index.php?operation=top&state=arizona&
city=mesa&limit=10

We’re wanting to convert it to this:

http://www.best-food-of-the-usa.com/arizona/mesa/top10.html

However, we want to use the same Rewrite Rule for any city and state. To do this we will use regular expressions, but we will also use variables in our DESTINATION by using the values from the groups in our PATTERN. Whenever we use a group (groups are contained in left and right parentheses) in our PATTERN, the string that gets matched in that group can be accessed by use of a variable in our DESTINATION . That way, if our group ([a-zA-z]+) matches the string “arizona”, we will be able to use that string in our DESTINATION.

RewriteRule ([a-zA-z]+)/([a-zA-z]+)/top10\.html$ index.php?operation=top&state=$1&city=$2&limit=10

This will rewrite our pretty URL to the URL needed by our script. You will notice within our DESTINATION that we have state=$1 and city=$2. The $1 and the $2 represent the values from our first and second regular expressions in our patterns, respectively, and contain whatever strings matched. in the case of our example URL $1 would contain “arizona” and $2 would contain “mesa”.

Conclusion & Other Resources

We’ve rewritten our ugly URLs to be able to parse something a good deal more readable. And we learned a little bit about regular expressions along the way. Regex can be a confounding mistress; extremely powerful but extremely confusing, depending on how complex your patterns are.

I highly recommend you take a look at Added Bytes’ mod_rewrite Cheat Sheet to get you started and as a handy reference. And their Regular Expressions Cheat Sheet is great for that side of things.

You can also take a look at Apache’s documentation on mod_rewrite:
Apache Mod_rewrite

And a few other guides on getting started:
Mod_rewrite: A Beginner’s Guide
URL Rewriting
Easy Mod Rewrite

If you’re getting stuck with regular expressions, post a comment and we’ll ask the regular expressions bear (that’s Edgar) to reply.

Posted in: How To