How can i clean extra code out of Word HTML

Microsoft Word has a long history of being able to "save for web" or "save as HTML", and is pretty good at being able to copy from Word and paste into applications like Expressions, or web WYSIWYG editors like FCK Editor and Tiny MCE found in the content management systems Joomla! and DotNetNuke.

The problem is, however that the HTML code that is often created is filled with at best extra, unnecessary code and tags, but most often, broken, bloated code that can cause problems with the nice plans that you have for the rest of your code that follows your style tags / CSS.

We've often needed tools to help clean Word HTML code, and here are just a few of our favorites that we've used over time. Hope you find it useful.

  1. Microsoft Expressions - Job = OK - You can pretty quickly and easily do some basic cleaning by taking your Word HTML in Expressions and in code view, highlight all and right click and choose "optimize HTML" and in the popup's options is to clean up "Word HTML". This does an ok job of removing some of the main bloat.
  2. http://www.textism.com/wordcleaner  - Job = Pretty Fantastic, The BEST! Although you can do very small files for free (and testing purposes), a modest subscription fee will get you Textism's fantastic web application where you can clean the Word HTML code like a PRO out of large documents. You specify an .html file and run the webapp process. It then shows you the cleaned code that you can copy/paste back into a new document. This app is one of our favorites because it also converts all non standard characters (like curly quotes, em and en dashes, Macintosh character issues, etc) into the proper ASCII. 
  3. http://www.algotech.dk/word-html-cleaner-input.htm#developer  - Job = Great. This webapp also does a great job... it has a nice feature too to "leave in" or "remove" the font tags which we like. It does not replace extended characters with ASCII though... you'll have to do that afterwards if necessary.
  4. http://www.convertwordtohtml.com/ - Job = Pretty Fantastic! Best for a POWER Job : ) Download this desktop application and you'll be doing great in no time! this is a powerful application that has templates and rules that you can create to customize the levels/styles of content that is cleaned and edited. The trial version is free to work with, and $99 for the full version. Extra fantastic benefits are template functions to do complex find/replace, convert email/web addresses to href links, custom rules and more!
  5. http://www.jafsoft.com/detagger/ - This is another desktop application that others have given good results and reports on. We haven't tried it, but at $29 it's a good option to test out.