I’ve been playing around with the W3C HTML validator, and I’ve found, sadly, that there’s no easy way to get this page to validate. There were some problems that I fixed, but when I try to validate against 4.01 Transitional, I get about 50 errors related to the use of “&” in URLs.
Apparently you’re supposed to use the HTML entity for the ampersand (“& a m p ;”) even in URLs. But since this entitiy isn’t present in the URL in the address bar of the browser, and that’s where you generally copy the URL from, how are you supposed to convert these without manually picking through every URL you use? You could try to get funky with regular expressions, but I can’t imagine that would work perfectly in every case.
This brings up a larger point in that you can’t really expect to validate a site where a large part of the HTML of the page is provided by people other than the original Web developer. Every entry on this page — comprising the entire middle section — can be entered by someone else, and how can I make sure they’re entering valid HTML markup?
This is where HTML Tidy integration will work very well in PHP 5. Using this tool, you can validate HTML that people enter before you store it in the database, or before you output it. You can make sure all tags are closed, all tags match, etc. so perhaps you can hope for some sort of valid markup.
But, in an even larger sense, does validation matter much? I’ve never gotten any comment from anyone about the validation of this site. So what that I’m throwing 50 errors because of ampersands in URLs — can someone provide me with a valid (excuse the pun) reason why this matters?
I understand problems can occur from gross misuse of the HTML spec, but are all validation errors created equal? My apparent misuse of ampersands has got to rank pretty low on the sin list.
Follow Gadgetopia on Twitter
Tidying up your HTML with PHP 5: The next version of PHP includes an extension for HTML Tidy, so you can have every HTML document perfectly formatted on its way out the door by applying HTML Tidy to the output buffer. When the Tidy extension is installed, it can be automatically…
Yes, ampersands must be written as & in HTML (this includes attribute values). This is to differentiate them from the beginning of an entity. Naturally most browsers can cope with a lonely & in a URL.
However I always make sure to write &. For one thing, I don't want 50 trivial errors to hide the 51st real error which might matter, and I the HTML validator is useful in finding those too.
With a content management system I wrote some years ago (Onpage2.com) all user-input would be validated by Tidy HTML before it would be stores as XML snippet in the SQL database. You might not think too much of HTML validation, but if you use XML (to have it be XHTML in the end) validation is absolutely necessary. You cannot handle XML objects which don't validate, really.
Talking about validation, just found some bugs in my current template. Bugs always reintroduce themselves if you don't constantly validate!
Good luck with your ampersands!
Just noticed you don't correctly escape ampersands in comments either, so some of the meaning above got lost -- you should always convert all "&" entered by the user to "& a m p ;"...
You've been blogged ... click on my name
There is a plugin for MT that will perform the '&' to '& amp;' conversion for you, or you could hack one in yourself.
Thats why W3C recommends using semicolons rather than ampersands to separate key/value pairs in the query string. Quite simple solution really.
But what Web development languages support that syntax? (And by "support," I mean "automatically parse.")
Any, unless your afraid of parsing the query string yourself.
hi, i need http://validator.w3.org. like site. can any one help me starting building like that site.
contact: sayfrndship@gmail.com thnkx in advance