URL Rewriting - Content Negotiation



Article by Thomas A. Powell and Joe Lima from Port80 Software

Page 1

Page 2 - You are here

Rewrite query strings - URL Rewriting

In the cases where pages should be dynamic, it is still possible to clean up their query strings. Simple cleaning usually remaps the ?, &, and + symbols in a URL to more readily typeable characters. Thus, a URL like http://www.xyz.com/presssearch.asp?key=New+Robot&year=2003&view=print might become something like http://www.xyz.com/pressearch.asp/key/New-Robot/year/2003/view/print. While this makes the page "look" static, it is indeed still dynamic. The look of the URL is a little less intimidating to users and may be more search engine friendly as well (search engines have been known to halt at the ? character). In conjunction with the next tip, this might even discourage URL parameter manipulation by potential site hackers who can't tell the difference between a dynamic page and a static one.

The challenge with URL rewriting is that it takes some significant planning to do well, and the primary tools used for these purposes - rule-based URL rewriters like mod_rewrite for Apache and ISAPI Rewrite for IIS - have daunting rule syntax for developers unseasoned in the use of regular expressions. However, the effort to learn how to use these tools properly is well worth it.

Back to Previous

Content Negotiation - Remove extensions from files in URL and source

Probably the most interesting URL improvement that can be made involves the concept of content negotiation. Despite being a long-supported HTTP specification, content negotiation is rarely used on the web today. The basic idea of content negotiation is that the browser transmits information about the resources it wants or can accept (MIME types preferred, language used, character encodings supported, etc.) to the server, and this information is then used, along with server configuration choices, to dynamically determine the actual content and format that should be transmitted back to the browser. Metaphorically, the browser and the server hold a negotiation over which of the available representations of a given resource is the best one to deliver, given the preferences of each side. What this means is that a user can request a URL like http://www.xyz.com/products, and the language of the content returned can be determined automatically - resulting in the content being delivered from either a file like products-en.html for English speaking users or one like products-es.html for Spanish speakers. Technology choices such as file format (PNG or GIF, xhtml or HTML) can also be determined via content negotiation, allowing a site to support a range of browser capabilities in a manner transparent to the end user.

Content negotiation not only allows developers to present alternate representations of content but has a significant side effect of allowing URLs to be completely abstract. For example, a URL like http://www.xyz.com/products/robot, where robot is not a directory but an actual file, is completely legal when content negotiation is employed. The actual file used, be it robot.html, robot.cfm, robot.asp, etc., is determined using the negotiation rules. Abstracting away from the file extension details has two significant benefits. First, security is significantly improved as potential hackers can't immediately identify the web site's underlying technology. Second, by abstracting the extension from the URL, the technology can be changed by the developer at will. If you consider URLs to be effectively function calls to a web application, cleaned URLs introduce the very basics of data hiding.

URLs can be cleaned server-side using a web server extension that implements content negotiation, such as mod_negotiation for Apache or PageXchanger for IIS. However, getting a filter that can do the content negotiation is only half of the job. The underlying URLs present in HTML or other files must have their file extensions removed in order to realize the abstraction and security benefits of content negotiation. Removing the file extensions in source code is easy enough using search and replace in a web editor like Dreamweaver MX or HomeSite. Some tools like w3compiler also are being developed to improve page preparation for negotiation and transmission. One word of assurance: don't jump to the conclusion that your files won't be named page.html anymore. Remember that, on your server, the precious extensions are safe and sound. Content negotiation only means that the extensions disappear from source code, markup, and typed URLs.

Back to Previous

Automatically spell check directory and file names entered by users

The last tip is probably the least useful, but it is the easiest to do: spell check your file and directory names. On the off chance that a user spells a file name wrong, makes a typo in extension or path, or encounters a broken link, recovery is easy enough with a spelling check. Given that the typo will start to generate a 404 in the server, a spelling module can jump in and try to match the file or directory name most likely typed. If file and directory names are relatively unique in a site, this last ditch effort can match correctly for numerous typos. If not, you get the 404 as expected. Creating simple "Did you mean X?"-style URLs requires the simple installation of a server filter like mod_speling for Apache or URLSpellCheck for IIS. The performance hit is not an issue, given that the correction filter is only called upon a 404 error, and it is better to result in a proper page than serve a 404 to save a minor amount of performance on your error page delivery. In short, there is no reason this shouldn't be done, and it is surprising that this feature is not built-in to all modern web servers.

Back to Previous

Conclusions

Most of the tips presented here are fairly straightforward, with the partial exception of URL cleaning and rewriting. All of them can be accomplished with a reasonable amount of effort. The result of this effort should be cleaned URLs that are short, understandable, permanent, and devoid of implementation details. This should significantly improve the usability, maintainability and security of a web site. The potential objections that developers and administrators might have against next generation URLs will probably have to do with any performance problems they might encounter using server filters to implement them or issues involving search engine compatibility. As to the former, many of the required technologies are quite mature in the Apache world, and their newer IIS equivalents are usually explicitly modeled on the Apache exemplars, so that bodes well.

As to the search engine concerns, fortunately, Google so far has not shown any issue at all with cleaned URLs. At this point, the main thing standing in the way of the adoption of next generation URLs is the simple fact that so few developers know they are possible, while some who do are too comfortable with the status quo to explore them in earnest. This is a pity, because while these improved URLs may not be the mythical URN-style keyword always promised to be just around the corner, they can substantially improve the web experience for both users and developers alike in the long run.

Back to Previous

Further Resources

Articles

Numerous articles have been written about the need for clean URLs. A few of the more prominent ones are cited here.

Apache Tools

For Apache, nearly all modules can be found at modules.apache.org.

Links to useful information about mod_rewrite can be found at modrewrite.com.

A good overview of content negotiation on Apache can be found at httpd.apache.org/docs/content-negotiation.html.

Microsoft IIS Tools

IIS does not quite have the b module culture Apache does, but the site www.iismodules.com lists many commercially and freely available modules, and www.iisanswers.com, www.iisfaq.com, and www.iis-resources.com have related links and more detailed information on filter use on IIS.

The specific commercial IIS products mentioned in the article include URLSpellCheck, ISAPI Rewrite for IIS, PageXchanger, and w3compiler.

The authors would encourage submission of other tools and articles to improve the article's resource listing.

Back to Previous


Article Disclaimer: The SEO Consultants Directory does not endorse the opinions and/or facts expressed by members who provide marketing articles for our site. These search engine optimization and search engine marketing articles are here for you to review and make your own decisions.

If you are a member of the SEO Consultants Directory, you can submit a search engine marketing article for review by following the instructions in our Member Submitted SEO/SEM Articles section.

Back to Previous

 


SEO Consultants Directory