A Simple mod_rewrite Tutorial: SEO-Friendly, Attractive URLs
By Jeremy Lindblom
On July 23rd, 2008
As an intern here at Synapse Studios, I used to be naive about some things. [*Some* things? *Used* to be?—Ed] I’d visit websites, look at their URLs and wonder about the time and planning they must have taken in creating so many directories. Take this news article from Yahoo for example: http://news.yahoo.com/s/ap/20080715/ap_on_bi_ge/oil_prices
Does news.yahoo.com really have the directory structure reflected by this URL? Of course not. That would create an astronomical amount of directories in a website with thousands of index.html files. (And wouldn’t make any sense to begin with; that’s what query strings and dynamic pages are for.) But there’s a way to create legible, sensical URLs that mirror a directory structure but really include variables to a dynamic page. It’s called a rewrite engine. Through the use of a rewrite engine, we can create URLs that make sense to people AND that are attractive to search engines. Take a look at how, after the jump.
Apache includes a rewrite engine called mod_rewrite. Since we use Apache as our web server of choice, we’ll be covering how to configure mod_rewrite.
Mod_rewrite is an interesting beast. Some call it “magic” and some call it a “nightmare”. Really, it’s a bit of a magical nightmare that, when skillfully and practically employed becomes a powerful tool. It’s not a cure-all for every problem with URLs, but it’s great for doing many things. You can use mod_rewrite to:
- Prevent hotlinking
- Create pretty and SEO friendly URLs, eschewing the use of illegible query strings
- Create redirect pages
Redirection is handy because you can move pages if need be, maintaining the old URL while serving a page from the new address. This removes the need for a redirection server-side script, client-side script or meta-refresh and keeps your search engine rankings from taking a hit.
However, in this tutorial I want to focus on how to use mod_rewrite to replace illegible query strings. This tutorial is not comprehensive, but it should provide you with a place to start from and teach you how to do basic rewrites.
Our Goal
Let’s say you have written a PHP script that lists the top 10 restaurants of any city in any state. You are very happy with your excellent script and you have spent several years traveling across the United States on unicycle to personally collect this data. Let’s say your website is www.best-food-of-the-usa.com. Well, the URL to look at the best restaurants in Mesa, Arizona, is pretty ugly looking:
http://www.best-food-of-the-usa.com/index.php?operation=top&state=arizona&
city=mesa&limit=10
Blah! Let’s see what we can do with mod_rewrite to make that look nicer, more appealing and comprehensible to the user, and more search engine friendly. (As search engines apply rank based on the file name and path, putting relevant information in the core path instead of the query string can help boost your ranks considerably.)
Enable mod_rewrite
The first step to using mod-rewrite on your website is to make sure it’s installed and enabled. If you have a system administrator, they should be able to do this for you. If you don’t, you’ll need access to your httpd.conf and you’ll want to uncomment the line that reads:
#LoadModule rewrite_module modules/mod_rewrite.so
by removing the #. (Or add the line it if it doesn’t exist.) You’ll need to restart Apache for this change to take effect.
Create an .htaccess File
Create a plaintext file and add the following two lines:
Options +FollowSymLinks
RewriteEngine On
These lines ensure that mod_rewrite is enabled and active. Save this file as .htaccess. Make sure that you do not accidentally name it .htaccess.txt. (If you’re creating the file in notepad, make sure you select “All Files” from the file type dropdown when saving it. If you’re creating the file in vi on the server, you should already know what’s up, and it won’t try to add an extension automagically.) Put this file in the directory with your script.
Rewrite Rules & Regular Expressions
Now we are ready to start writing some basic rewrite rules. Each rule follows this form:
RewriteRule PATTERN DESTINATION [FLAGS]
The PATTERN is what we are looking for and the DESTINATION is what we are going to rewrite to. FLAGS are options you can set for each rewrite… optionally. FLAGS are beyond the scope of this tutorial and will have to wait for the next.
Here’s a small example:
RewriteRule ^home\.html$ index.html [R,NC,L]
This rewrite rule redirects visitors to home.html to the index.html page. Mod_rewrite makes use of regular expressions (regexes, for short) to parse your PATTERN element. If you’re not familiar with regex, they’re extremely powerful expressions that make use of special characters to denote pattern-types to search for, like ranges, white space, numbers and more. They’re a bit difficult to wrap your head around at first, so let’s look at this example to start.
The PATTERN portion of the rewrite rule typically begins with a carat, ^ and ends with a dollar sign, $. This syntax establishes the beginning and end of the pattern to match. These two symbols are optional, and including or excluding them can have different effects. If only carat (^) is used at the beginning, our PATTERN must appear at the beginning of the URL. If only the $ is used, our PATTERN must be at the end of the URL. If both are used, our PATTERN can appear anywhere inside the URL. To make our PATTERN exactly what we want, we are going to remove the carat so that the PATTERN is matched only at the end of the URL. This will be the case in rest of this tutorial.
Next we have the period, (.). (In case you didn’t know what symbol I was referring to there…) The period is used in regular expression syntax to indicate “any single character.” Since that’s not what you mean in the example home.html, you need to tell the parser to ignore it. You do this with a backslash, \, directly in front of EACH character you need to escape. Because your DESTINATION doesn’t go through a regex parser, you don’t need to deal with that on the destination side of things.
If you find your PATTERN breaking, you’re probably going to want to give it a once-over and escape any non-alphanumeric characters with a backslash. Chief among these, ^ $ [ ] ( ) { } ! \ .. (That’s right—if you want a literal backslash in your PATTERN, you need to escape it… with a backslash.
Our first rewrite rule example wasn’t too helpful; it redirected users but didn’t rewrite a query string. What we need are rules that can apply to variable incoming requests. Let’s suppose we take any page and redirect it to index.html. This is a useful way to create a front controller pattern. We would write the rule like this:
RewriteRule (.+)\.html$ index.html
You use parentheses to group things in your regular expressions. In this case we have created a group containing “.+“. The period is interpreted as “any character” and the plus means “1 or more times”. So all together, our PATTERN is (any character one or more times).html. That matches any page with an html extension. If we wanted to include htm extensions, we would change our rule thusly:
RewriteRule (.+)\.(html|htm)$ index.html
The pipe character ( | ) is used to represent a logical OR. That means our pattern can end in “html” OR “htm”.
Range brackets are also very useful. Let’s say we only wanted to accept pages that had alphabetical characters in their URL only. Then we would write our rule as follows:
RewriteRule ([a-zA-Z]+)\.html$ index.html
The range is shown by putting values within square brackets. You can choose specific values like [abc] (only allows a, b, or c) or you can do ranges like [a-g], [0-9] or [n-y]. Now that we’ve covered some regex basics, we can build a pattern that can accept variables within a “directory structured” URL.
Pretty URLs
Our example URL from earlier looks like this:
http://www.best-food-of-the-usa.com/index.php?operation=top&state=arizona&
city=mesa&limit=10
We’re wanting to convert it to this:
http://www.best-food-of-the-usa.com/arizona/mesa/top10.html
However, we want to use the same Rewrite Rule for any city and state. To do this we will use regular expressions, but we will also use variables in our DESTINATION by using the values from the groups in our PATTERN. Whenever we use a group (groups are contained in left and right parentheses) in our PATTERN, the string that gets matched in that group can be accessed by use of a variable in our DESTINATION . That way, if our group ([a-zA-z]+) matches the string “arizona”, we will be able to use that string in our DESTINATION.
RewriteRule ([a-zA-z]+)/([a-zA-z]+)/top10\.html$ index.php?operation=top&state=$1&city=$2&limit=10
This will rewrite our pretty URL to the URL needed by our script. You will notice within our DESTINATION that we have state=$1 and city=$2. The $1 and the $2 represent the values from our first and second regular expressions in our patterns, respectively, and contain whatever strings matched. in the case of our example URL $1 would contain “arizona” and $2 would contain “mesa”.
Conclusion & Other Resources
We’ve rewritten our ugly URLs to be able to parse something a good deal more readable. And we learned a little bit about regular expressions along the way. Regex can be a confounding mistress; extremely powerful but extremely confusing, depending on how complex your patterns are.
I highly recommend you take a look at Added Bytes’ mod_rewrite Cheat Sheet to get you started and as a handy reference. And their Regular Expressions Cheat Sheet is great for that side of things.
You can also take a look at Apache’s documentation on mod_rewrite:
Apache Mod_rewrite
And a few other guides on getting started:
Mod_rewrite: A Beginner’s Guide
URL Rewriting
Easy Mod Rewrite
If you’re getting stuck with regular expressions, post a comment and we’ll ask the regular expressions bear (that’s Edgar) to reply.
Tagged with: apache, difficulty-intermediate, intern, mod_rewrite, regex, regular expressions, seo, urls
Posted in: How To
Related Posts
Trackbacks on this post
Discussion on this post
-
You know, it may be that Yahoo actually has files for the directory structure in your introduction, and has rewrite only apply to files that do not exist. The overhead of accessing a database is nontrivial to them. But that’s another feature of mod-rewrite - you can put static pages in places to address availability concerns.












July 23rd, 2008 at 1:33 am
[...] Credit:A Simple mod_rewrite Tutorial: SEO-Friendly, Attractive URLs [...]
July 25th, 2008 at 2:14 am
[...] Typically for content pages, a name of a page is passed in the query string (or specified using mod_rewrite). That part is all fine and good but the trouble arises in how our budding programmers go about [...]
October 2nd, 2008 at 2:41 am
[...] To sum it all up, there are certain applications where I believe it makes sense to use mod_rewrite for SEO URLs and other times when it might not make a difference. Though, if you are interested in learning more on how to rewrite URLS for yourself in your .htaccess or httpd.conf file, check out the htmlist guide for using mod_rewrite. [...]
October 19th, 2008 at 4:46 pm
A Simple mod_rewrite Tutorial: SEO-Friendly, Attractive URLs…
Mod_rewrite is an interesting beast. Some call it “magic” and some call it a “nightmare”. Really, it’s a bit of a magical nightmare that, when skillfully and practically employed becomes a powerful tool. It’s not a cure-all for every problem with URLs,…
December 29th, 2008 at 6:12 pm
[...] URL rewrites (very very very important for SEO) [...]
February 3rd, 2009 at 3:58 am
code reuse…
You have got to be kidding!…
March 8th, 2009 at 11:31 am
[...] A Simple mod_rewrite Tutorial: SEO-Friendly, Attractive URLs [...]