I maintain a blog (dalqe.com) run on MySQL that I update via a form. Occasionally when I copy text into the form from another source (especially an email or Word document) some non-text characters are copied as well. This doesn't cause a problem visually, but since I am converting to XHTML I have noticed that these non-visual characters are causing my pages not to validate (they cannot be identified as part of the character set). Sooooo... is there a way to strip my form input of anything but plain text before inserting it into the database? I'm already using htmlspecialchars().