By David Bernal
On July 25th, 2008
One of the more exciting (by which I mean soul-crushing, murderous-rage inducing) things about my job is getting to look at the terrible, terrible code that runs all sorts of different websites on the internet. Chances are, you wrote some of this terrible code; I know I did. It’s even possible that you still write such code. In case you do, go grab yourself a stiff drink (a strong vodka martini, or gin and tonic is recommended) and get comfortable, because I’m about to lay some edumacation on you.
Things You Should Never Do
First, I’m going to talk about some things I’ve come across that you should never, ever do. If you do these things and I ever have to work on your code, be prepared for the fury of ten-thousand burning suns to come crashing down around you, for I have warned you. Pay it forward by writing decent code, for one day you may find yourself having to maintain someone else’s heap of terrible code and woe, for you shall feel some tinge of guilt, having made other developers go through your new hell.
All of the snippets in this article were found in actual code being used in the wild. (And most of them come from a single, disastrous, amalgamation-of-fail file.)
Use the double dollar sign with register_globals
$Var_Name=$cust['id'];//this is a number, such as 5 $Var_Value=$$Var_Name;
Do you see what’s happening here? No? I didn’t either, until staring at it agape for about 10 minutes. As it turns out, there is a form that submits to this page with form fields like
<input name="5" ... />. Since register globals is on, this variable can be accessed as
$5. Accessing variables this way adds several levels of indirection to the code, makes it difficult to read, and difficult to debug. All this is aside from the obvious security issues associated with use of register globals. If you’re still on a server with register globals enabled, you need to get off. The security risks are too great and it’s time to bite the bullet and update your code to work without it.
Instead, access variables by using the superglobal arrays. It’s not any harder, and produces far more readable and more secure code.
Blind Database Queries: Queries Made Without Specified Field Names
Basically, this is when you do things like
SELECT * (bad without
E_NOTICE error reporting turned on) or
INSERT INTO schools VALUES(...) (bad always).
mysql_query("INSERT INTO schools VALUES( \"".$busid."\", \"\", \"".$busName."\", \"".$name."\", \"".$address."\", \"".$city."\", \"\", \"\", <snip> ".$bw." ) ");
There are actually so many problems with that snippet of code that my eyes are bleeding. Well, not so much bleeding, as leaking vitreous humor all over the place. BECAUSE THEY. EXPLODED. But the main thing I want to talk about is the lovely
VALUES part of the query.
The problem here is that the field each value is placed into is determined by the order in which it appears in the database. You’ll notice that there are several instances of
\"\" in that snippet. This is necessary when you write bad queries like this and find yourself wanting to write to a field that comes after a field or two you don’t want to write to. The
\"\" is a placeholder, leapfrogging those fields.
It’s cryptic in that, to determine what this query is doing, you have to actually go into the database and start counting fields. And it’s brittle in that, if I insert a field into the table, everything explodes, as you have to rewrite all of your queries to reflect the presence of a new field or ensure that the new field is inserted at the end of the table. Completely stupid, that. There are smart ways to be a lazy programmer and idiotic ways. Guess which one this is.
If that weren’t enough, there are additional problems with the query: It’s unclear if the variables have been escaped to prevent an SQL injection attack. Using
sprintf may be preferable to string concatenation and make sure you use mysql_real_escape_string where appropriate. Finally, rumor on the street is that prepared statements can be cached with greater efficiency than on the fly query strings. But let’s stick to issue of unnamed fields.
You should always list the fields you are inserting in the query after the table name. I actually had the pleasure of rewriting this one, so I’ll just show it below:
mysql_query("INSERT INTO schools(id, busname, name, address, city, <snip> ) VALUES(...) ");
This makes the query far more readable, and if I want to add a field later on, I just add it to the field list. Oh, and though you may be attempted to try
INSERT INTO ... SELECT * FROM ... to simplify an insert, it ranks as a pretty straightforward “bad practice,” even though SQL allows you to insert into fields straight up like this without defining them. Don’t do it. You’re better than that. Those who have to navigate your code will thank you later.
Using Switch Cases to Implement Your Entire Website in One File
This is one I come across surprisingly often, which leads me to think that new programmers are often tempted to implement a front controller pattern without having enough experience in object oriented programming to do it, you know, the right way. Typically for content pages, a name of a page is passed in the query string (or specified using mod_rewrite). That part is all fine and good but the trouble arises in how our budding programmers go about dispatching the requests.
Instead of implementing a sensible routing scheme that delegates different objects to handle the page, the programmer in this example performs a switch on the name of the page, and then performs the logic for each page inside of the switch case (in one massive file). It’s like we’re programming in QBasic here, without the benefits of
GOTO. This makes the site very hard to work with because for one, there is no simple way of just opening the file containing the logic for a given page and editing it. It is far more difficult understand code when it is structured in this way, as it is easy to become lost in the circuitous conditionals associated with a user going from page to page. (And often when hacking away at this file, I’ll find myself pasting chunks of it into a “scratchpad” or sorts, if not breaking down and just breaking it into component files outright, though sometimes that’s not a luxury we have.)
Another issue with this is if the developer wants to have certain logic added before the giant switch statement, he will typically add a few conditionals here and there that perform extra operations for certain pages. This kind of scope pollution makes it even more difficult to follow a variable through the execution path. Similar to this is the possibility that the page tends to have many exit points. In certain cases, for example, the programmer may wish to output HTML to the screen, while in others, she may wish to redirect the user to a different page. Because this is done in many different places, it becomes difficult to determine when and how execution of our massive switch statement ends.
That’s just scratching the surface, the problems with this solution flow like the vitreous humor from my wrecked eyes. Since so much functionality exists in a single page, the programmer can’t edit the file without risking taking down the entire site for want of, say, a semicolon. And because she is stupid enough to try this to begin with, she probably don’t have a dev environment set up. It also makes tracking down errors an immense pain in the ass and, to heap on the damage, makes multi-developer editing pretty much impossible.
There are a lot of alternatives to doing this. The simplest, and perhaps best approach for beginners is to just create separate
.php files for each page, using
require to bring in common logic and setup. Even using a switch statement, but including the rest of the logic is a big step up, although problems with namespace pollution can still arise in that scenario. More advanced programmers should look to using a Front Controller as described in Alur et al’s “Core Java2EE Patterns“. You can find similar PHP implementations in most of the many wonderful frameworks available for PHP. (cf. wikipedia for a big ol’ list.)
Things You Should Do, Often
There are actually several more examples I could give of horrible things people do in code, but I’d instead like to look at a few simple techniques you can use that will make my job much much easier when your shitty websites come my way.
Use Classes Or, For The Love Of All Things Holy, At Least Use Functions
Instead of just writing big long pages that do everything needed on that page, and then duplicating much of the same functionality on every single page, use classes and functions to logically group together functionality. I’d like to think that this is a no-brainer for anyone who’s attempted to apply logic to the process of application development, but for those of you getting started, I’ll spell out why you want to take the time to break things up.
Instead of writing the code for a page that gets a database connection, creates the query, performs the query, and checks the results on every single page, write a function called
saveUserData that does all that for you. Better yet, write a
User class with a
save method that allows you to work with the data like a plain old object, and that does the saving for you. This means that you write significantly less code (simple pages on sites we do are 100-200 lines long, versus 500+ lines long on poorly-written sites), and the code you write is more resilient to changes on the backend (switching databases, changing schemas, changes in business logic, etc.). Furthermore, your code is far more readable, because each segment of code has a clear purpose that is only defined once.
Learn About Different Ways to Work With Strings
PHP has a lot of ways to work with strings. You can use double quotes, single quotes, heredocs, and even output buffering. Each of these has its uses, and you should be familiar with the advantages and disadvantages of each. In general, though, I recommend singles quotes for short strings (things like SQL queries), and output buffering for longer strings. The only time I use double quotes is for escaped sequences which wouldn’t be expanded in a single quoted string. To illustrate these two techniques, I’ll list a couple of examples.
Single Quotes vs. Double Quotes
In this first example, we’ll take the SQL query from above and show what it looks like using double quotes and using single quotes.
mysql_query("INSERT INTO schools VALUES( \"$busId\", \"\", \"$busName"\, \"$name\", \"$address\", \"$city\", \"\", \"\", <snip> $bw) ");
Using double quotes, the variables are buried in the string literal. This means, for example, that most text editors and IDEs won’t highlight the variables. The lack of separation between variables and string literals makes both harder to pick out. Additionally, anywhere that you need double quotes (often needed both for HTML and SQL queries) has to be escaped. This is ugly. Finally, PHP actually parses the string literal to try to determine if there is a variable embedded in it somewhere. This can have a slight performance impact, but you probably wouldn’t notice it.
Alternatively, we could use single quotes:
mysql_query('INSERT INTO schools VALUES( "'.$busId.'", "", "'.$busName.'", "'.$name.'", "'.$address.'", "'.$city.'", "", "'., <snip> '.$bw.') ');
In this case, we get to use double quotes as much as we want. This saves typing and makes the code easier to read. Additionally, there is clear separation between variables and string literals, making the code more readable. And PHP knows what to expect and doesn’t have to parse your string for variables.
I also see a lot of new PHP programmers (including myself, when I was still green), use the
echo construct with long strings in order to generate HTML.
$foo=$baz->bar($bing, $bot, $bor); echo "<div align='center'> <table border='0' cellpadding='4' cellspacing='1' width='100%' id='table1' bordercolor='#000080' bgcolor='#000080'> <tr> <td><b><font size='4'><font color='#FFFF00'>$foo</font> <font color='#FFFFFF'>$foo.</font></font></b><font color='#FFFFFF'>Yup, it sure is.</font></td> </tr> </table> </div>";
This makes it very difficult to use double quotes anywhere. This makes coding unnatural and cumbersome, since double quotes appear in many places. Most editors I’ve used don’t have any way of knowing you’re working in HTML, which makes it impossible to use their syntax highlighting and code folding features.
Instead, it’s better to leave PHP and write the HTML outside of the PHP environment. If the HTML is needed in a variable (e.g., for returning from a function),
ob_get_clean() can be used.
$foo=$baz->bar($bing, $bot, $bor); ob_start(); ?> <div align='center'> <table border='0' cellpadding='4' cellspacing='1' width='100%' id='table1' bordercolor='#000080' bgcolor='#000080'> <tr> <td><b><font size='4'><font color='#FFFF00'>$foo </font> <font color='#FFFFFF'><?php echo $foo; ?>.</font></font></b><font color='#FFFFFF'>Yup, it sure is.</font></td> </tr> </table> </div> <?php $baz=ob_get_clean();
Alternatively, you could put all of the HTML into an external file that you include. By wrapping the
include in output buffering, you will still be able to capture the output into a variable while keeping the HTML easy to read and edit for developers and designers alike.
Document your code. Do it. I’m serious. Use JavaDoc syntax (for PHPDocumentor) and document the hell out of it. Use comments to identify what things are, and use them to describe what you’re doing in situations where your intent is not exceedingly obvious from the code. Some say that if your code is too complex to be understood by looking at it, then instead of commenting, you should simplify your code. That’s ridiculous and those people deserve to be ridiculed. [They're probably mostly Ruby coders.—Ed] Sometimes things are just complicated, and well written comments describing your intent and providing context and explanation for actions is not only essential for other developers to understand your code, but also to help you understand what you were doing when you come back to a project to make changes. (Many a time, I’ve stared at code I wrote for what seemed like hours, trying to remember what the hell I was thinking. Documentation cures this problem. Most of the time.)
Making My Life Easier
We’ve really only scratched the surface as far as things you can do to make your code better. Most of the information in this post is aimed towards making code incrementally easier to read and maintain. We really haven’t even touched on best practices. I wrote this from the perspective of someone who has spent a lot of time maintaining other people’s code, but these are really tips that will improve your code for yourself as well, especially if you’re just getting started.
This becomes especially important with open source projects, where you have a lot of people wanting to really get into your code. Current and previous-generation open source PHP projects have a pretty nasty reputation for containing lots of poorly-written spaghetti code, and frankly, based on most of the projects I’ve seen, it’s deserved. Hopefully organizations like Zend and Varien (authors of Magento —though there is still work to do on that), and community pressure (from folks like us, we like to think) will push the next generation of PHP projects to the next level.
- A Simple mod_rewrite Tutorial: SEO-Friendly, Attractive URLs
- Google’s AJAX-powered Search Results Break Keyword Tracking
- Workaround to Gmail Manager Firefox Extension
- Extending PHP 5.3 Closures with Serialization and Reflection
- Linux CLI Tutorial Part 1—Some Under-Utilized Bash Tools for the CLI Feeble