Directory and File Naming Conventions



2009-02-11 - Years ago when Blogs and other publishing platforms were becoming popular, someone, somewhere, decided that using the <title> of the Blog Post for the website address (URI) was a good idea. Little did they know that they would be responsible for a cottage industry developing to help manage those long unruly and totally unfriendly URIs.

Table of Contents

  1. URI Shortening Services
  2. 72 Characters
  3. HTTP Status Code: 414 Request-URI Too Long
  4. Flat Structure vs. Category Structure
  5. Roll Your Own Shortened URIs
  6. Social Media and 140 Character SMS Limits
  7. Date Based URIs
  8. Related: Google News and the 3/4 Digit Rule
  9. Summary
  10. More Information about URIs

URI Shortening Services

Who would have known that such a service would ever be needed? And, who would have ever thought that a service like Twitter would become one of the most popular means of communication amongst peers? What does Twitter have to do with this? Read further, that 140 SMS character limit is surely wreaking havoc on all those using the long hyphenated URIs. These publishers are forced to use a URI Shortening Service therefore inserting a middle man that is outside the brand domain. From my perspective, this is not good practice and may have negative side effects in the overall scheme of things.

Back to Previous

72 Characters

Did you know that not all emails are read and/or sent in HTML mode? Did you also know that email, when sent in Text Mode and/or converted to text during its travels, that there is a 72 character width limit per line within the email message itself? For me, that is the maximum URI length I'd be working with to ensure that my links were not breaking during their travels. And, I'm surely not going to use a URI Shortening Service to present what looks like a spoofed URI string of some sort and may cause users to not click the link. I'm serious, there are some challenges in this area when you have links that look like the links in email spam that many are used to receiving. It is a cause and effect that you as a Publisher should be proactive at avoiding.

Another challenge that comes into play with email is when the replying and/or forwarding of the original message begin. Depending on the users email settings, the original message may be converted to plain text and get appended with (> ). When this happens, it takes away from the original 72 characters you started with. Usually you'll have the appended character along with a space and then the line from the original message. In this scenario, we've just lost 2 of our original 72 character maximum so now we're down to 70.

What does a 70 character URI look like? Here are some examples using our domain which is 30 characters to start. That means we have a maximum of 40 characters to work with in our directory and file naming conventions before wrapping occurs.

That last 66 character example is from an older article posted here at the directory by Gord Collins. At that time, we were not 100% certain how we were going to move forward with directory and file naming conventions. I was not comfortable at all with the number of hyphens appearing and when we hit that third hyphen, I made the decision to start working towards a more intuitive shorter URI structure which in turn forced me to tighten up the overall taxonomy.

TITLE Truncation in the SERPs ...

Page titles in the SERPs will normally truncate at around 70 characters depending on the word composition at the point of truncation. Yahoo! recommends 67 characters as a limit.

More important, search engines use titles to index web sites, and often display them in search engine results. To make your page most appealing to search engines, we recommend that you limit your page title to 67 characters and do not include images in the page title area.

Note: The 67 character limit for the TITLE Element is a Yahoo! suggestion and is not a hard rule although the above verbiage from Yahoo! would tend to make you think otherwise. Longer character counts perform just fine.

We've seen Google state in writing 2-20 words, while never mentioning character counts, for article title lengths in various services that they provide such as Google News. How that may translate over to the TITLE Element is beyond the scope of this article.

It is always a good practice to target your primary keyword phrases at the beginning of the TITLE. Well balanced forward and reverse thinking is beneficial in this area.

Back to Previous

HTTP Status Code: 414 Request-URI Too Long

Did you know that there is an HTTP Status Code that the server can return if your URI exceeds a certain length?

The server is refusing to service the request because the Request-URI is longer than the server is willing to interpret. This rare condition is only likely to occur when a client has improperly converted a POST request to a GET request with long query information, when the client has descended into a URI "black hole" of redirection (e.g., a redirected URI prefix that points to a suffix of itself), or when the server is under attack by a client attempting to exploit security holes present in some servers using fixed-length buffers for reading or manipulating the Request-URI.

Back to Previous

Flat Structure vs. Category Structure

Flat Structure
Content resides at the root level of the website and presents a flat horizontal structure.
/document
Category Structure
Content is categorized into sub-directories and may use the root for top level categories. This type of structure presents both horizontal and vertical categorization allowing for maximum scalability including URI management and user friendliness.
/topcat/, /topcat/document, /topcat/subcat/document, /topcat/subcat/subcat/document

When naming categories, I try to keep them to one primary top level keyword. If two or even three keywords are necessary, one or maybe two hyphens may be acceptable. But, I'll work my magic and figure out some way to categorize a destination using single word category paths. There is a place for everything and everything in its place.

I see many hopping on the Flat Structure bandwagon and I'm here to tell you that the future holds quite a few challenges for you. I'll let you figure out all the technical details but, the foremost issue you will be faced with in a flat structure is file naming conventions. I can't begin to tell you the number of cons that far outweigh the pros in this situation. You may also end up with a plethora of multi-hyphenated-URIs-which-may-not-be-real-user-friendly. And, I don't think they are really that SEO friendly like some claim them to be.

I feel most, if not all websites with sufficient content, SHOULD have at least one sub-directory level. This allows you to categorize your content and removes the naming restrictions that a flat structure imposes. You can still use the flat structure concept but, only for top level pages, those should always be at the top of the structure (top of the click path) whether it be at the root or within a sub-directory. Click paths will determine the structural flow of your website.

Back to Previous

Roll Your Own Shortened URIs

We are in the process now of creating a tutorial for Windows that will allow you to utilize ISAPI_Rewrite to develop your own URI Shortening routines. Take out the 301 middle man and end the fragmentation of your brand. For example, I've set up a quick shortening routine for this article. I can utilize a link like this http://www.SEOConsultants.com/2009/02/11 which will 301 to http://www.SEOConsultants.com/uris/. While not the most optimal example since we use shorter URIs to begin with, it does illustrate the technique that can be easily implemented for most website owners whether your website is hosted on Windows or Apache (*nix) servers.

Here's a great example of our URI shortening routine at work on one of our directory level testing areas...

http://www.SEOConsultants.com/clicks/

The above permanently redirects (301) to a 76 character URI that is 23 levels deep...

http://www.SEOConsultants.com/s/e/o/c/o/n/s/u/l/t/a/n/t/s/d/i/r/e/c/t/o/r/y/

Back to Previous

Social Media and 140 Character SMS Limits

If you are relegated to utilizing long URIs and find yourself promoting your content within Social Media outlets or platforms that limit your character counts (usually the 140 SMS limitation), you may want to consider rolling your own URI Shortening Service. If you have a domain that falls within the reasonable character limits (*59 or less) and can manage a flat URI shortening routine, I'd suggest this option before using a third party service.

* Based on utilizing an ISO 8601 date string for the shortening routines. How you handle additional posts on the same date is your choice. You could do /2009/02/11, or /2009/02/11/abc, whatever works best for you from both usability and scalability perspectives. I surely don't want to make recommendations that may cause challenges for you in the long term.

Back to Previous

Date Based URIs

I'm finding that dates are very important when working with publisher type content. It is only natural to utilize them in the URI for archiving purposes. And, they work well when developing a same domain shortening routine. Just remember to keep them as short as you can. Shorter URIs placed just right in SMS messages do not get converted in many instances. This allows you to post branded links to your quality content.

Note: Date based URIs can be formatted in a variety of ways. Some may use a continuous date string, others may categorize dates further due to content volume. Again, this is something you should give careful consideration to before launching a date based taxonomy.

Back to Previous

Related: Google News and the 3/4 Digit Rule

You'll also want to heed Google's advice here and pay very close attention to how they suggest the formatting of URI strings for Google News Publishers.

Google News (publishers) Help > Technical Requirements: Article URLs

Display a three-digit number. The URL for each article must contain a unique number consisting of at least three digits. For example, we can't crawl an article with this URL: http://www.example.com/news/article23.html. We can, however, crawl an article with this URL: http://www.example.com/news/article234.html. Keep in mind that if the only number in the article consists of an isolated four-digit number that resembles a year, such as http://www.example.com/news/article2006.html, we won't be able to crawl it.

Back to Previous

Summary

Using the title of your Blog Posts for the file naming is not best practice from a variety of usability standpoints as I've outlined above and have referenced below. And please, no references to the Google Blog or any other trusted resource that utilizes the long multi-hyphenated domain structure. Just because they opted to inflict that usability nightmare upon their users is surely no reason that you should do the same. Take my advice, don't follow in this instance, set your own standards following best practices in this area.

Back to Previous

More Information about URIs

  1. 2009-02-03 - HTML/XHTML Tips: URI Fragment Identifiers
  2. 2008-08-11 - URIs and Pascal Casing: A Study of Case
  3. Search Engine Friendly URLs: URL Rewriting
  4. W3: Cool URIs Don't Change
  5. Jakob Nielsen: URL as UI
  6. FAQs RFC2396: Uniform Resource Identifiers (URI): Generic Syntax
    (RFC2396 revises and replaces RFC1738 and RFC1808)

Back to Previous


 


SEO Consultants Directory