Actually, I would suggest NOT using XHTML until the browsers fully support it, which they don't currently (Neither does php, look at the output that comes from highlight_string()). Really, there's no reason to jump straight into using XHTML, if you really want to just use Style Sheets, you can in HTML 4.01, but still allow for proprietry crap like contenteditable. Sure, it looks cool to have the little w3 XHTML verification image on your site, but apart from that the browsers aren't really noticing a difference.
Anyway, about the script, I think it's going to be a matter of a few regular expressions to do what you want. Find out the tags the MS is using, and create regex's (PCRE would be better, because this will be SLOW anyway), and use preg_replace to fix them all up. AFAIK there isn't a script out there that will do that 😉