The big problem you face is that "HTML" is not really defined. If you can be sure it's well formed according to some standard, then an XML parser might be able to handle it. But browsers tolerate all kinds of junk, and they all behave differently when they encounter junk. Do you want the information that would be displayed by IE or NS? With scripting and applets turned on or off? Do you just want to strip off the tags (php can do that with builtin functions). What about all the includes, script and other stuff that's not really HTML?