2009-08-28 - Let me preface this with a bit of history. I started researching at the W3 in 1996. Around 1999/2000, it became a daily read while developing various Internet properties, applications, etc. ALL of the answers to our many questions pertaining to the markup languages we were utilizing such as HTML Elements and Attributes were available at the W3.
For those who have traversed the W3, you'll have to admit that it's an immense space of technical documents. If you are not aware of how to navigate the website effectively, it does present a daunting browsing experience. Much of that has changed over the years and many of the new documents are more simplified in their approach, almost made for the everyday Webmaster. They also contain many more working examples that help solidify the technical specifications.
An example of how to quickly find references at the W3 is to use Google (or your favorite SE) and perform a site: search query e.g. the importance of website validation site:w3.org which will usually put you in very close proximity of the information you're searching for.
I SHOULD warn you in advance that if you do not fully assimilate the basics of the markup languages you are working with such as HTML, XHTML, and/or CSS, then this entire series of articles would be a waste of your valuable time.
2010-03-12 - Check out the SEO Website Validation Showdown
This series of articles combines the techniques of website validation and semantic markup into one encompassing topic. Since the two skillsets are closely related, these articles will naturally cross over. Prepare yourselves for hours of reading or, take the short way out and read the Summary on The Importance of Website Validation.
Validation is the process of utilizing a tool and/or manual human audit to check the validity of web documents. Most web documents are written using HTML or XHTML and CSS markup languages. There are technical specifications that define these markup languages and they include a machine readable formal grammar. The act of checking a web document against these specifications is called validation.
From the W3:
Validating web documents is an important step which can dramatically help improving and ensuring their quality, and it can save a lot of time and money (read more on why validating matters). Validation is, however, neither a full quality check, nor is it strictly equivalent to checking for conformance to the specification.
One of the important maxims of computer programming is: 'Be conservative in what you produce; be liberal in what you accept.'
On 2009 January 29, olivier Théreaux from the W3 posted this topic to the W3C Questions and Answers Blog...
Valid sites work better(?) -
It makes me curious, however, to know what are the real life arguments in favor of valid, standard code today. Do you have an untold story of validation getting rid of an awful rendering glitch? Real life accounts of a search engine bump achieved by fixing the syntax of your HTML <head>? A typo in a CSS stylesheet that hours of glancing at code didn't show, but the validator did? A forgotten alt that would have lowered your search rank for an important keyword, or cost a big fee for non-accessibility?
Strong emphasis mine. olivier was the open-source lead, lead developer and community manager for the W3C Validators and Quality Tools. His contributions to the developer community are far reaching. And yes, he prefers that the "o" in olivier be lowercase.
The above topic was the fuel for this document from the W3 on why website validation is important. It's not your typical W3 page and worth the read.
Why Validate? -
This document attempts to answer the questions many people have regarding why they should bother with Validating their web sites and tries to dispel a few common myths.
I've given this much thought and I've come to the conclusion that I cannot explain the benefits of validation from an SEO perspective in one article, or maybe even two or three. We're talking years of reading technical documents at ALL levels of the W3, documents that provide a depth of understanding that goes beyond your everyday Surface SEO. This is the next step after all the SEO Plugins, now you need to understand what they are doing and how they may be harming your site, not helping it. This SHOULD have been the first step.
I've put together a short list of getting started with understanding the intricacies of HTML markup and the reasons for validation. If you do not read the associated content, you will NOT fully understand the core concepts of validation and why it is important to utilize proper semantic markup in ALL of your development projects.
You MUST first understand HTML in it's basic form. You'll also need to understand the basic concepts of CSS as the two languages work hand in hand.
If you are already familiar with the below basic information, skip to the next section on how Validation Affects SEO.
Note: Are you really sure you understand everything below? If you can pass these two very basic SEO Tests with a 100% score, you can most likely skip to the next section.
The primary basic example I typically use of how validation affects SEO is...
required attribute "alt" not specified
/images/file.jpg" width="72" height="108" />
The attribute given above is required for an element that you've used, but you have omitted it.
In the above fictional example, the
alt attribute has been omitted by the Webmaster. This may be one of 20 cascading HTML validation errors that refer to
alt attributes being omitted. Maybe 10 of those are images used for category links across the top of the page. Of course most of us would avoid using imagery in this fashion unless we had an alternative method of presenting that content, which we do.
altAttribute (a basic example)
We can utilize the
alt attribute to replicate exactly what the image reads. No keyword stuffing, but an exact duplicate of the text contained in the image. You CANNOT misuse or abuse
alt attributes without it affecting the user experience and your optimization strategies in some form or fashion. Note that I use the term optimization by itself without the search engine preface, we are appealing to more than search engines when optimizing our content.
Content is "equivalent" to other content when both fulfill essentially the same function or purpose upon presentation to the user.
In reference to the above note, a graphic
<h1> using alternative text per the technical specifications is treated the same as an
<h1> that is plain text. There is no need for designers to jump through hoops to try and serve anything else.
This is correct semantic markup properly interpreted by crawlers based on the UAAG...
<h1><img src="/images/the-importance-of-website-validation.png" width="200" height="24" alt="The Importance of Website Validation" /></h1>
And is treated the same as...
<h1>The Importance of Website Validation</h1>
Another example of using images in place of text typically occurs in primary navigation menus either at the top or left of the document. In that scenario, you might have semantic markup that looks like this...
<li><a href="/parts/" title="Parts Department"><img src="/images/parts.png" width="160" height="24" alt="Parts" /></a></li>
<li><a href="/accessories/" title="Product Accessories"><img src="/images/accessories.png" width="160" height="24" alt="Accessories" /></a></li>
<li><a href="/service/" title="24 Hour Emergency Service"><img src="/images/service.png" width="160" height="24" alt="Service" /></a></li>
Which may be interpreted something like this from a semantic viewpoint. My example below takes into consideration ALL of the semantic markup from the example above.
[A title: Parts Department]
[IMG src: /images/parts.png]
[IMG alt: Parts]
[A title: Product Accessories]
[IMG src: /images/accessories.png]
[IMG alt: Accessories]
[A title: 24 Hour Emergency Service]
[IMG src: /images/service.png]
[IMG alt: Service]
Content is "equivalent" to other content when both fulfill essentially the same function or purpose upon presentation to the user. I will continue to enforce this method of thinking throughout this article.
I've used the
title attribute in the above example as I feel this is proper use of markup. You wouldn't assign the
title attribute to the image, you would assign it to the
a href as that is what you are describing to the visitor using assistive technologies. There ARE instances where you would assign both
title attributes to an image.
alt attributes are designed to be displayed when images are disabled.
title attributes are designed to be displayed on hover and are referred to as advisory information. They can be used on almost ALL HTML/XHTML Elements.
Do search engines take the
title attribute into consideration? I don't know, do they? It doesn't matter to me, my users may take it into consideration and that is the priority. Whether or not these types of attributes prove to be of SEO value, they would NOT work if you don't use the formulas properly, also referred to as best practices in HTML authoring. More about the value of
title attributes and other semantic markup.
altAttributes and The SEO Sniff Test
Did you know that some users surf with images off? I understand they are quite popular these days in certain circles. How is that user, with images disabled, supposed to navigate your website if there are no appropriate
alt attributes assigned to those primary category images? Remember, that user could be Googlebot, Slurp, MSNBot or any other UA designed to index and crawl data. It could also be a visitor using assistive software technologies to access your ecommerce site.
Content is "equivalent" to other content when both fulfill essentially the same function or purpose upon presentation to the user.
If you're using FireFox, download the Web Developer Toolbar Extension, you'll thank me later.
Example screenshot below shows the Images > Disable Images option from the Web Developer Toolbar. I use this frequently when performing what I call SEO Sniff Tests. You can tell a lot about a website when you browse with images off. It's like looking in the back seat of someone's automobile with all the fast food garbage bags (third party plugins), wrappers (table code bloat), empty soda cans (missing
alt attributes), etc.
Without website validation, you would not be aware of the above poor coding practices. You would not know that you may be missing out on accessibility and optimization opportunities. You may not know that you're about ready to get served with a $6 million dollar lawsuit because your website is inaccessible to the blind.
alt attributes are mandatory for validation and for SEO. They are also mandatory for Government compliance in certain countries e.g. AU, CA, CH, DE, DK, ES, EU, FI, FR, HK, IL, IN, IT, JA, NZ, PT, UK, and the US.
If images are used for presentation purposes e.g. spacer.gif, those MUST be assigned an empty
alt attribute (e.g.
alt=""), that's two quote marks with no space inbetween. You don't want a blank
alt attribute (e.g.
alt=" "), you want what is officially referred to as an empty string (e.g.
If you do not understand the above information, go back to Accessibility.
Note: If you are still using spacer.gif in 2010, you're a prime candidate for learning about HTML, CSS, validation and website accessibility, they're all closely related. By the way, spacer.gif are so 80s/90s.
The first errors returned in a W3 Validation Report may contain those which reside within the
<head></head> element of your document. Validation is performed in a top to bottom fashion with errors listed in a cascading order. You'll want to pay very close attention to what is happening here. While most of the errors in this area may be recovered from, why would you leave the fate of your indexing and crawling to an error recovery routine?
Note: Did you know that
<body></body> elements are optional based on the W3 Specifications?
Error routines are very robust these days, they MUST be with the amount of tag soup being generated by Webmasters. Be diligent in wanting to know about what you see. Even though your website looks wonderful at the browser level, have you ever wondered what it may look like to a user with disabilities and/or a bot?
That's the part that is continually overlooked by today's Surface SEO, the Novice, one who is just beginning their career and learning about the importance of semantic markup and validation. They're working towards becoming a professional and can usually offer solid basic advice for what they SEE while browsing your site VISUALLY. Unfortunately I don't think that is going to be sufficient for websites being developed to compete in the 21st Century.
These types of SEOs will suggest that you install a CMS and build your business around that which is typically a solid suggestion if you fit that criteria. You may be offered a wide variety of SEO Plugins that do one thing or another. You'll perform a few downloads, push a few buttons, watch a few processing screens and next thing you know, PRESTO, instant SEO! It ALL happens right there on the screen in front of you, automagically.
What many of them cannot do is pop the hood (aka analyze the HTML/CSS) and perform a few tests (a dyno) to see if your website is running at peak performance. The Surface SEO is looking at the paint job (suggesting an inferior wax product), the interior (using Armor All®), listening to your sound system (on AM Radio), and sniffing your exhaust (without analyzing its content).
They won't know that the malformed syntax in the
<head></head> of your document may be causing potential crawl challenges (poor fuel performance). Nor will they realize that all the images in your state-of-the-art navigation system are void of
alt attributes and/or other alternative content (running too lean).
For the Surface SEO, this type of micro-auditing (performance tuning) is not important. It is quite obvious when performing SEO Audits on some of the more popular SEO websites in the industry. I won't name any names, you know who you are. I'm doing that out of respect for YOU, my peers, even though I think some of YOU are major slackers in this area.
Not only should you be aware of the above oversights, you SHOULD also look for improper use of HTML Elements and Attributes via your SEO Plugins. An example of this would be the use of
title attributes on links...
<a href="/search-engine-optimization" title="Search Engine Optimization">Search Engine Optimization</a>
2009-08-21 - http://Twuna.com/Remove-Title Attributes Plugin from WordPress - STOP the Stuttering!
titleAttribute (a basic example)
If you've read the documents referenced above, the term Stuttering would have come up multiple times. The Stuttering effect is caused by various improper coding practices that are plugged into your site e.g. the improper use of the
title attribute as shown above. You should NEVER repeat the anchor text within the
title attribute. In fact, you SHOULD only use the
title attribute in specific circumstances where you are limited to textual description and need to add a short Tooltip for the visitor.
Tooltips are a term used for describing the message that is displayed when hovering your cursor over an element that uses the
title attribute. These attributes are usually found on anchor elements such as Read More, Click Here, or in navigational areas where character counts are limited and brevity is required. They may also be used in other assistive ways providing alternative content within web based applications.
This message will self-destruct in five seconds. Good luck Jim...
Just remember, you have approximately five seconds to make your point with the
title attribute. That's how long the message (Tooltip) displays before disappearing. Brevity in your
title attributes are also a requirement. Treat them like
alt attributes where you have a suggested <80 characters to define your alternative content.
CSS Tip: I utilize...
border-bottom:.1em dashed #ccc;
padding:0 0 .2em 0;
...along with other visual clues to alert the user to a Tooltip.
Yes it is! And if anyone tells you otherwise, they are doing an injustice to themselves, to those who they may be performing SEO services for and, to the SEO Industry in general. No, your website may not validate 100% due to html markup that is outside of your control, stuff like SEO Plugins and such. If you find a Plugin that is producing HTML markup that is incorrect, let the creator know. The two of you can brag about fixing it and move on to the next markup challenge, there will usually be more than a handful.
The very least you can do to work towards valid code is to correct the Errors and Warnings that are being generated by your markup, those within your control, that would be the first place to start. When something breaks markup related, you can narrow it down to third party code and not that of your own, in most instances. Third party code is typically the culprit in not being able to validate your website(s) 100%. You're dealing with a bunch of wantabe coders who have never even read, nor do they understand the technical specifications for the markup they are producing. That's like having my Landscaper provide performance upgrades for my BMW.
I am. Many others are too. This isn't rocket science either. The technical specifications for HTML/XHTML and CSS markup have been in existence since the 90s and are the framework of today's Internet. Hopefully by showing you the basic examples above, you can clearly see how performing website validation can save you, the Professional SEO, from making amateur mistakes and missing out on micro-optimization opportunities. All of these add up, they are cumulative. Points can be added or, they can be subtracted.
Over the years I've written many articles on SEO at this particular level using the W3 as a primary source of information. You'll find my posts on WebmasterWorld where I Moderated 3 forums for 7 years (2001-2008). As the Administrator for the SEO Consultants Directory, I've contributed to our articles library since we launched in 2002 June.
You'll find a group of articles that I've recently written (2008-2010) which will be part of this website validation series moving forward. I will be providing 1Click references to sections within the below documents and other areas of the directory. It's easier to take this in sections instead of ALL at once, trust me on this.
I've put a lot of research and time into assembling all of these. I also practice what I preach, I'm in the trenches daily with YOU, I don't just write about this stuff, I do IT too. I hope you'll share in my enthusiasm and become a master of your trade. It all starts here...
If your brain doesn't hurt yet, and you have some more time for reading, my comments to a post at SEOBook.com on 2009-08-21 may be of interest to you as they directly tie in to the writing of this article on The Importance of Website Validation.
Aaron Wall (@aaronwall) from SEOBook.com says...
In April a web designer who came across our site gave me the following feedback 'I don't know how you can advertise your skills in SEO when such a vital part of a good quality site is valid markup. Your homepage has 40 errors when I just checked.'
To which I replied '...and yet I rank page 1 in Google for SEO. Who cares about valid code? Not me. And not Google. Oh well.'
The above attitude about writing valid code is rampant in the SEO Industry. Using SEOBook.com or Google.com as examples of websites that don't validate is an injustice due to their history and inbound link volume. Most of us understand that history and links typically trump all else.
Surface SEOs are incapable of reading and understanding HTML/XHTML/CSS. It's quite obvious when performing SEO Audits (including but not limited to validation and semantic extraction routines) on well known SEO Websites. Unfortunately that lack of understanding may work against them moving forward in this industry.
The bottom line? There are technical specifications and standards for writing machine readable formal grammar such as HTML/XHTML/CSS. If you don't adhere to those standards and guidelines, you leave the fate of your information retrieval to error recovery routines.
Think about the long term implications of poor coding practices. I don't see any benefits to producing invalid code. Do you? I don't see that much more time and/or money involved in producing valid code. Do you? So, what's the reasoning again for not writing valid code?
Main Article URI Short: http://Twuna.com/Code-Nazis-0
Important Note: As of 2009-08-29, the above Post at SEOBook.com has 534 Errors and 25 Warnings found while checking the document against an XHTML 1.0 Transitional DOCTYPE. 11 of those errors are for required attribute
alt not specified on images that SHOULD have a descriptive
alt attribute. 13 of those errors are for ID already defined. These are ALL errors that SHOULD not be present. Makes you wonder huh?
If you've followed the links provided above and performed the due diligence in research that I have, you'll have found this section in the Techniques for WCAG 2.0. The three items listed under the Description for H88 summarize this topic on The Importance of Website Validation.
The objective of this technique is to use HTML and XHTML according to their respective specifications. Technology specifications define the meaning and proper handling of features of the technology. Using those features in the manner described by the specification ensures that user agents, including assistive technologies, will be able to present representations of the feature that are accurate to the author's intent and interoperable with each other.
There are a few broad aspects to using HTML and XHTML according to their specification.
- Using only features that are defined in the specification HTML defines sets of elements, attributes, and attribute values that may be used on Web pages. These features have specific semantic meanings and are intended to be processed by user agents in particular ways. Sometimes, however, additional features come into common authoring practice. These are usually initially supported by only one user agent. When features not in the specification are used, many user agents may not support the feature for a while or ever. Furthermore, lacking standard specifications for the use of these features, different user agents may provide varying support. This impacts accessibility because assistive technologies, developed with fewer resources than mainstream user agents, may take a long time if ever to add useful support. Therefore, authors should avoid features not defined in HTML and XHTML to prevent unexpected accessibility problems.
- Using features in the manner prescribed by the specification The HTML specification provides specific guidance about how particular elements, attributes, and attribute values are to be processed and understood semantically. Sometimes, however, authors use features in a manner that is not supported by the specification, for example, using semantic elements to achieve visual effects without intending the underlying semantic message to be conveyed. This leads to confusion for user agents and assistive technologies that rely on correct semantic information to present a coherent representation of the page. It is important to use HTML features only as prescribed by the HTML specification.
- Making sure the content can be parsed HTML and XHTML also define how content should be encoded in order to be correctly processed by user agents. Rules about the structure of start and end tags, attributes and values, nesting of elements, etc. ensure that user agents will parse the content in a way to achieve the intended document representation. Following the structural rules in these specifications is an important part of using these technologies according to specification.
Hi Edward, just like to say thanks for joining the discussion regarding Validation on SEOBook.com last week. I have to admit that I was actually opposed to what you were saying but upon a good dose of reflection, I now see that you are right. Like any tradesman it's always good to know that people do take pride in their work and yes, focus on quality. My site had a combined 1200+ errors and warnings, I've spent hours and hours fixing 95 percent of it - and I'm thankful, believe me! Your follow up article: SEOConsultants.com/Validation/ will be a source of information for me for a very long time and I intend to pass it on.
Jeremy from Australia
URI Short: http://Twuna.com/Semantic-Data Extraction for The Importance of Website Validation (30 characters)