Office 2000 HTML Filter 2.0: This is a handy little tool.
Once you have completed editing an HTML document in Word 2000 or Excel 2000, you can use the Office HTML Filter to remove the Office-specific markup tags from the final copy of the HTML document.
I took a 45-page Microsoft whitepaper on remote OS installations, saved it as HTML, then ran it through this tool. File size was reduced by almost half, and the result looked perfect in Firefox. In fact, you’d be hard-pressed to tell if you were viewing it in Firefox or Word itself.
The HTML markup is quite good, and the app adds an “Export To” menu in Word, so you can create HTML direct from Word without an extra step. Here’s some detail of exactly what it does — what it removes and what it doesn’t.
The thing installs as a DLL which is called from Word VBA, which tells me there’s got to be a way to script it. I think you could put this on your server, allow users to upload Word files, convert them to clean HTML on the fly, then present them on your Web site.
If you like this idea, try Dean Allen’s Word HTML Cleaner. One day he needs to open this up as a Web service.