I started getting “serious” about development because I had a desire never to write lengthy, wandering streams of code again.  It was not for any reason but unadulterated laziness—the kind that so overpowers the better senses as to force a person to spend hours in a chair with the express goal of not spending hours in said chair.  It’s a wild, consuming laziness that seems to know no bounds.

As developers, once we start separating our code into abstract ontological typologies, we make use of the human mind’s phenomenal ability to work with types.  Our code becomes less about jump tables and registers and more about users, email messages and images.  What once was a problem of allocating resources and operations within the computer becomes an abstract, logical problem within a collection of objects.  Like children awe-struck by stories of magicians of old, speaking incantations and pressing their wishes into reality by the power of their mind alone, we become drunk with the sense of awe and possibility.

We *really* dive into things after the jump, so go ahead and hit it.

The magician’s spell is our codebase, and as they toil in their laboratories collecting newts and dragon finger-nails, so we assemble libraries or components and frameworks.  We write ever more diverse and sprawling codebases to power our behemoth, monolithic applications and we grow increasingly unhappy with the amount of time we spend writing the same code over and over.  As we come up with more magic to shorten our tasks, we outpace ourselves by envisaging larger and larger projects.

As we write better, more reusable code, early on in this process we experience the satisfaction of leveraging our components to achieve results.  So developers (those of us who aren’t terrible) get into this groove of reducing the size of the logical elements we write, reusing those elements, or recycling them through extension or currying. Over time, by constantly working to reuse our own code, we choose practices that work well for ourselves and discard practices that don’t work as well or slow down our workflow. For developers flying solo or those working on small projects, this evolutionary process is a sufficient way of going about things.

But there’s trouble when we add other players into the mix—other developers, a user interface person, a database person, a sysadmin, a project mana-jerk: as a developer, they don’t have access to our “experience” of the code and we don’t have access to theirs.  So the practices, workflows and logic which made sense to us when we built a component for personal reuse may not make nearly as much sense as we’d hope to another developer who tries to work with it. Worse, we don’t learn from these mistakes because we may not even know they’ve been made.  Even if we suspect there’s trouble—perhaps a dead fish in your computer case or threatening instant messages—we still may not know enough about the mistake to be able to affect positive changes on our behavior.

We’ve been doing some extra work on our libraries at the Studios, and we’ve been arguing about the why and how some our many programmes for code re-use have failed, miserably so, and the surprising things that are still in use today.

Recycling Aluminum, Plastic and Code

I envoked the slogan “reduce, reuse and recycle” earlier because refuse recycling serves as a good metaphor for code reuse, and frankly, because you can take the magician allegory only so far. Recycling is expensive: It has all of the costs of trash collection (save the landfill/incinerator/New Jersey [Hey!—Editor from NJ]) and adds the cost of sorting and additional transportation costs. The idea is that value is recuperated from the actual recycled materials, but that’s not always the case. (See the parallel yet?) From an about.com article on recycling:

Michael Shapiro, director of the U.S. Environmental Protection Agency’s Office of Solid Waste, also weighed in on the benefits of recycling:

“A well-run curbside recycling program can cost anywhere from $50 to more than $150 per ton trash collection and disposal programs, on the other hand, cost anywhere from $70 to more than $200 per ton. This demonstrates that, while there’s still room for improvements, recycling can be cost-effective.”

But in 2002, New York City, an early municipal recycling pioneer, found that its much-lauded recycling program was losing money, so it eliminated glass and plastic recycling. According to Mayor Michael Bloomberg, the benefits of recycling plastic and glass were outweighed by the price — recycling cost twice as much as disposal. Meanwhile, low demand for the materials meant that much of it was ending up in landfills anyway, despite best intentions.

We can draw parallels between the things that drive the cost of some municipal recycling programs up past those of others and the things that reduce our ability to reuse software.

Consider the following factors:

  • Volume of recyclables / Volume of components to reuse
  • Volume of non-recyclables /  How many components are NOT reused
  • Adoption of the recycling program / Adoption of libraries and frameworks by developers (consistently high usage rates are necessary for effectiveness)
  • Choices as to what to recycle and what not to recycle / Clearly defined boundaries on which components are reusable and which ones are not
  • Time it takes to sort through the material to separate the recyclables / Developer’s ability to know what’s already done and in the library or framework, versus what they have to build, versus what they want to build just for the hell of it
  • Expense to transport material /  Overhead associated with libraries and frameworks and autoloaders and oh my

In software development as in refuse recycling, we naturally want to reuse as much as possible. But the return on recycling can vary greatly from item to item. And when you factor in the cost of setting up a component for reuse (introducing it into the environment, building with proper levels of flexibility and extensibility) and the non-trivial, yet oft-overlooked cost of the time a developer spends familiarizing themself with the way you’ve designed the component, it may not make economic sense to recycle certain components.

It’s up to the savvy developer to decide what will be recycled and what will not be recycled. Sometimes we make poor choices and have components that are stuck into frameworks that we never use. This is a double-whammy, since we spend more time building the component for reuse and designing for scenarios which will never occur. How do we minimize this and select the best candidates for reuse? Let’s get into that.

Choosing What To Recycle

My alma mater, Arizona State University (ASU, go Devils!), has separate recycling bins for plastic and aluminum all around campus.  ASU doesn’t recycle only aluminum and plastic bottles, but these are the most prominent containers around campus. This decision makes a lot of sense because students go through a lot of soft drink cans and water bottles daily, while their paper waste output in public thoroughfares on campus is considerably less. (You normally sort and toss your flyers, notes or other paper propoganda once you get back to your dorm, for instance.)

Arizona State makes up a significant portion of the city of Tempe.  Tempe’s residents output a larger variety of refuse so their recycling effort has tried to be more inclusive of other materials; almost something of a long-tail of recycling, to a point.

It’s too easy to hang oneself in trying to make ever increasingly complex and elaborate artifices in the name of reuse. Deciding what components to build for reusability is an important part of making code reuse feasible. I wont enumerate the number of projects that we took extra time on so that we could reuse them only to have them languish in a corner of our code repository, never to see the light of day again. Nor will I tell you the dollar amount associated with the number. I will say that it’s significant.

But worse still is the number of hours we’ve spent coding the same component over and over again, each time a new iteration for a new project, when one well-developed go at it would’ve delivered a highly reusable, stable piece of brilliant cost-savings and elegant code. (Realistically, we develop a “reusable” component each time, making it “better” each time.)

The best way to know what components are good candidates for reuse is to know from experience.  If you don’t have experience I suggest you develop a time portal to the future and observe the results of your work before doing it. My preferred type of “time portals” are books on design patterns:

Martin Fowler’s “Patterns of Enterprise Application Architecture“, Alur et al’s “Core J2EE Patterns“, Gamma et al’s “Design Patterns” and Matt Zandstra’s “PHP Objects, Patterns, and Practice” are all great resources for problems that come up a lot and solutions that have worked, the latter being written in PHP. Beyond that, researching how things are done in other channels of software development (or even, other industries) can provide inspiration on what can be used and reused.

Once you’ve identified what types of components you’d like to reuse you must ensure that your components are, first and foremost, usable. It sounds simple, but a lot of people overlook this important requirement.

What is Actually Recyclable

I wrote a JavaScript form validation framework for a project some time back that failed horribly over the course of three projects due to my comfort with its complexity. And I was in no way a “newb” programmer at the time I developed the code. Form validation essentially came down to three components:

First, there was a messaging bus which allowed me to drop JavaScript message objects onto a function and have them organized and displayed to the user with pretty colors and effects.
Second, I wrote a component to highlight form fields in different ways, and to present contextual messages near the elements.
Third, I wrote a component to attach handlers to a form and validate the data, cancelling the form submission, running its own handler when validation failed.
I was intent on keeping these parts separate so that I could switch them out as different projects required. This was not a bad thing in and of itself, but it turned bad like prom night with the quickness.

As it turned out, it was too complex.  The three components had to be tied together each time, and that sucked.  I wanted to be able to switch out components with minimal incremental effort. Unfortunately, the significant effort involved in tying everything together was too much to ask from our developers.

In the middle of developing a component that we wish to reuse we are actually handicapped by the knowledge of what the component does and how it does it. When we get feedback from other developers, their impressions are often not as descriptive as we need them to be to be able to refine our code to be more reusable. What seemed like perfectly acceptable amounts of coding to me (heck, I just wrote the thing) was a bear-like chore for others, so I ended up implementing all of the validation—a chore that was horrible itself because, as it turned out, they were right about it requiring too much coding.

Validation required much of the fine-grained control that I had written, so I decided that my failure had to be one of documentation. I assembled some pages that defined the API and had example code for common use cases, but that didn’t fix anything. After watching Chris and David not use it, I left it for a while, came back, forgot how it worked, and realized how awful it was. I looked at what I was doing and realized that I was essentially rewriting much of the same code over and over. So I built a new type of validator, the SimpleValidator.

The SimpleValidator allows you to assign a function to a named element in a form, and if that function returns a string, the validator fails and gives the user all the string collected in this method.  Everything else happens magically, behind the scenes.  To create a SimpleValidator, you called one of the factory methods on the constructor that tailors things to your concerns.

The last thing I did was make a library of functions for common validation “schemes”, like email addresses, names, numbers, zip codes, et cetera, to streamline the process further. I avoided coupling the components by simply defining the factories for the different object in a SimpleValidator settings object.

That worked well 98% of the time. For the other 2% we could still dig through my documentation and figure out how to add the element via the older interfaces so that the required behavior could be accomplished at the cost of making the coupling concrete—a tradeoff we accepted happily.

Our final version was reusable because it had factories that corresponded to use cases. Adding a simple form validation script required little more than a list of what elements were of what type. Extending this behavior was easy as well, as the developer could simply wrap the existing validation function in another function which had further behavior defined. And since the low level mechanics were still all there and documented, when times got rough we simply reverted to that API and were able to achieve our desired behavior. I’ve been referring to this property of a software component as having a “degrading API”.

If we did not have this low level access to the component, the product would be abandoned when times got tough, new solutions would be introduced into the code base, and along with them we’d be adding confusion as to which component to use where. This happened with our security component some time back, and it was horrible. During testing no one knew what was going on when a security fault occurred. We just ran around flailing and screaming until someone would hug us down from our panic. [It's true, we prefer to be hugged down from our many, many panics.—Ed]

With form validation, we did not just create a degrading API for developers to work with, we created the right API for them to work with. In short, we read their freaking minds. You’re a bit spooked by this thought, I can sense that. Calm down—deep breaths. I’ll show you how.

Ensuring Recyclability – Communication

One way for those unfortunate souls among us who cannot read minds to arrive at acceptable component boundaries and API is to talk to the other developers in the team and get their input.

For another project we needed to build a star rating system. This means we needed an input for users to select a number of stars, and an ouput that would show the number of stars an item has received. After some time with Photoshop, I decided to use one image of stars in different states. I’d reuse this image for each of the 5 stars and have hidden radio buttons storing the state of the object. After talking with our JavaScript guy (me) about if I should do this in PHP or in JS, we decided JS was the way to go. You’d be surprised the number of times an internal dialogue-turned-external has solved problems AND frightened everyone else in the office to their very core.

Next I talked to the developers who would be working with this object and the CSS guy who would be styling this site. The developers liked the idea of just using an input element and having everything happen magically, but the stylists and JavaScripters were concerned about having low level access to the internals. We ended up doing the following:

  • Created a settings singleton that can contain much of the complexity of defining each Stars object (This allows developers to use the default settings, modify it to suit their own needs, and forget it, Ron Popeil style.)
  • Created a constructor in JavaScript that allows for an arguments object where we can override the settings singleton. This way developers can keep their settings in a convenient, easy-to-use object.
  • Created a static factory method which converts an input element of type=”star” into our model, copying the required attributes (such as class, style and id) as necessary.
  • Created a static method that trawls (getElementsByTagName) the DOM’s input elements and replaces the type=”star” ones with Stars objects.
  • Allowed setting a “disable” attribute in the arguments of the constructor, or in the settings, or searched for a disabled attribute on the input element, which caused the dynamic behavior not to be added making it a presentation only element.

This allowed our HTML guys to write things like <input type="star" name="rating" value="4.5" size="5" /> and <input type="star" value="4.5" size="5" disabled="true" /> and have it automagically created. The code is semantic—you understand what it is doing by reading it.   The JavaScript people (that’s me again) demanded error reporting through window.console if it exists, so we added that as well. Then we documented the crizazzle out of it. In this way, the API also degrades across our levels of concern for customization.

Using <input type="star" name="{element name}" value="{float}" size="{max}" /> seemed to encapsulate the required logic for incorporating star-ratings into a web form to such a degree that it became almost memetic. Using any other way to define the star-ratings seemed unnaturally complex and offputting. It was easy to integrate the feature all across the project and into other projects. That was the sure-fire sign that we had something good.

Ensuring Recyclability – Composition

Whenever I hear “API,” my mind automatically goes to a dark place of interface definitions and return type declarations for some class. This is fine for when we’re addessing a concern unique to a specific class, but if we wish to add functionality across many classes we run into trouble with that dark visage.

One place in which we’ve seen this play out is in the Observer pattern. The Standard PHP Library (SPL) has interfaces for SplSubject and SplObserver, and this leads people down the natural road of inheritence. Indeed, many projects use a God class from which all other objects extend. This sucks for three reasons I’ll explain and a few more that you can discover for yourself.

First, since late static binding is still due in PHP 5.3, there is difficulty in knowing what class you’re in if you’re not in an instance.
Second, it sucks because everything gets that logic whether you want it to or not.
Third, it sucks if you want to use two such projects since they each demand their own “one true God” class. It is quite the jihad to get them to work together, ha ha, no I kid, I kid.  But seriously, if you make the God classes extend another class, then updating your libraries will overwrite the changes and you’re stuck going through re-applying the changes, and it’s nothing short of a crusade. [Way to diversify your religious warfare references.—Ed]

The alternative is to implement the interface each time. The book “PHP Objects, Patterns, and Practice” by Matt Zandstra (which is an excellent book for which I have both the first and second editions) suggests doing just that (pg 204, 2nd ed.) But such an approach is so unpalatable that there is talk of adding mixins to the PHP language proper so that you can define functions once and have them mixed into your object definitions as a kind of pseudo-multiple-inheritence. We couldn’t wait for the language to change, so we had to come up with a solution then and there.

Martin Fowler talks about the often underused power of composition in PEAA, and we decided that his books had enough pages to be right. We decided to create an Event object that would handle all of the event interactions and be composited within our existing objects. (Later on we discovered that Symfony uses a similar approach.)  This made adding event capabilities to existing classes very simple. We could even decorate (via extension) most classes that we had no control over and give them event support.  In the worst case situation, we could have an event registry that dispatches the proper event object for each object given as argument. 

Our composition programme took about four hours with unit tests. First, we made the constructor protected and created a factory function that took a string or object, and an array of event types and returned an event manager. The factory always returns the same object (===) for whatever string or object is passed in as an argument and registers the event types with the event manager. When it registers the event types it adds new ones as they appear in the array. This means that if a parent constructor creates an event manager, and then our constructor “remakes” it, there will still only be one event manager linked to our object and it will have a valid list of all the available event types as well.

We created the event manager with an array access interface. The interface maps the name of the event to an event host that was contained in the manager. The host implements SplSubject and is the point of contact for the event. Then we added the SplObserver interface to our function reference class and demanded that event registrars use SplObserver.

The end result is that we can add event handling capabilities to a class by simply making a decorator with an Event_Manager $events property, and we can cause things to happen as simply as $this->events['onToString']->fire(); and $Obj->events['beforeSave']->attach( $FuncRef );.  We also created an event class that is passed to the listeners. This event allows the listeners to cancel the event, perform actions on the firing object, or modify the values being output (like Intercepting “Filter” from Alur et al).

The Carrying Cost


The cost I most overlook when it comes to software reuse is the carrying cost—that is, how much it costs to bring your reusable components into your workspace and make use of them.  On the web, this problem is significantly larger because we’re often straddling multiple languages that may lack an engineered solution for ensuring the proper resources are made available to the browser.  PHP’s autoload functionality is nice, but don’t ask it for an image referenced in a JavaScript include.  You’ll just be disapointed (like junior prom).  A further problem for PHP is that we’re building up and tearing down everything each time, so we need methods of importing components to be shallow and cachable.

The problem I kept having was that I needed to pass information from PHP’s environment to JavaScript imported into an XHTML document, and in that JavaScript document, I needed to reference other resources that were relative to the script file.  CSS gets this right in that references to images in external CSS documents are read relative to the CSS document path.  JavaScript, surprise, does no such thing.  Plus, some scripts need to occur at certain locations in the document (which can be surmounted by the event stack in IE or the event queue in Firefox; if you feel the urge to cut yourself, you’re doing it right.) Finally, some scripts must be included before others, but I didn’t want to throw errors when someone imported one component before another. 

We built something called Axon, which, for lack of a better description, allows for something like Java’s import statements.  A package definition file (there can be any number of them) describes resource names like “Synapse.Forms.Validation” and maps them to required resource names (dependencies) and resource paths associated to channels.  A channel was the end resource type, such as “CSS” or “JavaScript”, and mapped onto the Loader to manage which things are imported.  Further, Axon recursively met the dependencies to ensure things entered the environment in the proper order.  Lastly, we added a service definition so that a resource could describe itself as providing, say, “sendEmail”, and allowed the loader to be queried for a service.

Axon “worked” in that it made using our messaging bus system and form validation code easy. We used it everywhere.  It failed in that we never used services, and no one remembers if caching is on or off, causing comically bad situations where bug fixes don’t work or, alternatively, performance tanks.  Both great options.

Also, I was pretty much the only person to ever write things into Axon, and I never documented it well.  We ended up using the PEAR style name-to-directory convention convention for PHP objects and directly modifying our templates for most code for what I am told are “performance issues.”

The toe-hold that Axon had with the bus and form validation meant that it stuck around, and we slowly began to add more things to it that were useful (although the issues with caching persisted.)  But the down side of sticking around is that we’re stuck with it.  No mana-jerks want to budget time to give it an overhaul, and we developers couldn’t agree on how to fix it.    Currently we’re rotating out a portion of our Canopy framework in favor of Zend’s framework so we might abandon Axon altogether (although we’ll keep the name for something else), but if we weren’t in such turmoil and tumult we might well be stuck with Axon ex infinitum.

Hard Choices About Recycling

Some people (probably from the Drupal project) will note that composition has more performance overhead than directly defining the observers and subjects using the SPL interfaces. Yes, that’s true, especially in PHP. But that’s nothing compared to the performance overhead of PHP itself—you can make Java do some wicked fast things if you’ve got the time to write it. But we write in interpreted languages like PHP because it speeds development and deployment. That leaves us with some hard choices to make. For code recycling to work, like real recycling, it has to save money.

For projects like Drupal and WordPress, performance is an issue since people will be adding to them, and if the project is a performance hog to begin with, these additions will bog down performance further and the product will become unusable.  But performance is neither paramount nor sacrosanct.  You can spend real cheese on optimizing a system where adding more hardware may be a much cheaper solution.  Many projects we work on are fluid—that is to say, they require the ability to quickly and cheaply add new features and change according to their communities needs.  The added time for developers to work through the mess of optimization often is more expensive than just adding hardware to compensate for less-than-stellar code.

Now I’m not advocating solving all performance issues by throwing hardware at it.  That’s a terrible idea.  Terribly terrible.  But look at the cost of resources and where your project is going.  It may make sense (apologies for what I’m about to say) to prototype and start with PHP and then create a Java-based solution when the project is mature (maybe a few years later).

As PHP developers we find ourselves in a community that prides itself on open source. That pride means that every single one of us has downloaded WordPress, Drupal and Magento, looked at the source, and wondered “What the F double hockey sticks is this about?” With open source, the reuse ballgame is much wider. We’re targeting and being the targets of a much larger community of developers.

In open source projects, your code has to make sense to users from China to Oklahoma and let me tell you, there are some cultural divides there.  The documentation has to be written and translated.  And the scale of the larger open source projects usually means that reuse is limited to building your own items into their pre-existing frameworks. I always hope to find a tier separated from display so that I can envoke and operate on an application’s data object, but it rarely ever works out that way. Perhaps it’s a case of cyber-myopia, but the successful large OSS projects seem to emphasize ease of deployment and customization over individual component reuse. This is not the case for many of the projects self-identifying as “frameworks”.

There is an explosion of PHP frameworks and JavaScript frameworks, and the ones that have made themselves available as toolkits as well as frameworks (or, as David likes to call them, “component frameworks”) have been able to weather the storm (and garner more attention).  Similar to my form validation class, I think success in the open source world requires a low cost to deployment, degrading API’s and proper separation of concerns.

By making your code open source and available you can get feedback worth it’s weight (if printed on gold paper) in gold. Further, people who use the project will also make improvements. I posted a REST interface to Amazon’s S3 in response to a request on the Zend frameworks list some time ago, and the requester kindly made it function with streams (which I knew nothing about at the time). In relying on the (all too) true fact that there are people way smarter than me on the internet, we both got something very useful.

Conclusion

My psychic abilities tell me you’re wondering why this wall of text was worth your time. It probably wasn’t. (This is especially true if you’re Martin Fowler, man am I ever wasting your time, Martin, and I’m sorry.) For PHP developers especially, notice that as we make our code more meaningful ad face, we may be sacrificing speed on the processor but we’re gaining huge amounts of development time, bug tracking time and fees for busted keyboards and restraining order hearings. That’s the hard sell—as a developer we have to deal with costs, and as PHP developers we have the advantage of being on an already “slow” interpreted language rather than writing assembly into our C++ for speed.

By focusing on reuse we gain savings directly and indirectly, as reusable components serve as a vocabulary to discuss a project.  We all feel the need for it, but it’s difficult to secure a way to ensure reuse. In an effort to sum up, I am suggesting requiring the following in your own projects:

  • Require that concerns be addressed one domain at a time, and composite those solutions as needed
  • Focus on writing semantic code (for which the meaning is obvious from its structure)
  • Document your code and include example use cases
  • Documentation should always include a description of what you are trying to achieve as well as what you are actually achieving with your design.  Be honest
  • Use composition over inheritence when adding features to multiple classes from the past, present and future (mainly when spanning different codebases)
  • Allow for high level uses via factories
  • Allow for low level uses via constructors and factories.
    (When PHP 5.3 lands, don’t let your constructors be public)
  • When something will be constant for a long time, provide a place to put that data once and reuse it, don’t demand it each time
  • Talk to people who will be developing with this piece of code, or pretend you are them, or actually be them and learn what concerns they have
  • Imagine refactoring an older project to use this code. Ask yourself what problems you forsee, and how can you make design decisions now to prevent these problems?
  • Release your code to a community that can use it (and ensure they can use it by releasing it at GPL3 or better.) Encourage feedback and participation
  • Drink a lot of vodka and red bull and black out when you code so that you can approach it with a fresh mind the following day
  • Request comments at every stage,  examine the gap between what they expect and what you’re planning on building, but ignore feature requests that would cause you to drift off target.  Do not try to complete components all at once, rather let it be something to which you come back.  Don’t rush it, just let it happen, baby
  • Do not fear trading small amounts of performance for semantic structure. You’re not developing mainframes or nuclear stockpile simulators, you’re building web apps in PHP and more often than not the most expensive thing is you, so pamper yourself
  • Be able to say that something is not a good candidate for re-usability.  If, over the course of time, you begin to notice yourself re-writing it again and again in a way that can be abstracted, then go for it, but it’s okay if not everything is part of a library
  • Be willing to use (or willing to consider using) existing libraries for reusable components.  It will get you in the mindset of your community and provides you with a bunch of tools you don’t have to build yourself

If you do these things, you’ll get rich quick working from home.  No, not really, you won’t, at least you probably won’t, because the results I’ve cited are not typical.

In fact, in the spirit of this article I hope any reader who makes it to this point writes a post on their own experience—what’s worked and what’s failed.  My psychic abilities tell me that several of you read this and said “whoa, that’s a recipie for a performance disaster,”  and when projects explode in popularity this criticism is most certainly true.  My “Highly Available Enterprise Application Architecture with PHP” post is twice as long and three times as ornery.  Also it is entirely written in Tamarian allegory regarding Darmok and Jalad.  But with Thrift, Smarty-like templates, EC2, a smart caching strategy, APC, memcached and a little shuffling of resources you can rebound.

Now go hit the vodka red bulls and let us know your thoughts.

Posted in: Articles, Development