Hi

How can I remove Microsoft Word formatting from text (using PHP) when submitting a web form?

any one have one function for this ?

    Use a better web editing program then Word, and you could remove the word....

    dreamwaver has better form tools then Office.

    alirezaok;10889741 wrote:

    Hi

    How can I remove Microsoft Word formatting from text (using PHP) when submitting a web form?

    any one have one function for this ?

      If outputting UTF-8 content, this seems to work well:

      <?php
      function filterText($text)
      {
         $search = array (
            '&',
            '<',
            '>',
            '"',
            chr(212),
            chr(213),
            chr(210),
            chr(211),
            chr(209),
            chr(208),
            chr(201),
            chr(145),
            chr(146),
            chr(147),
            chr(148),
            chr(151),
            chr(150),
            chr(133)
         );
         $replace = array (
            '&amp;',
            '&lt;',
            '&gt;',
            '&quot;',
            '&#38;quot;',
            '&#38;#8216;',
            '&#38;#8217;',
            '&#38;#8220;',
            '&#38;#8221;',
            '&#38;#8211;',
            '&#38;#8212;',
            '&#38;#8230;',
            '&#38;#8216;',
            '&#38;#8217;',
            '&#38;#8220;',
            '&#38;#8221;',
            '&#38;#8211;',
            '&#38;#8212;',
            '&#38;#8230;'
         );
         return str_replace($search, $replace, $text);
      }
      
      $test = <<<END
      This here’s a test. “It is only a test.” ‘The end.’
      END;
      
      header('Content-Type: text/html; charset="UTF-8"');
      echo filterText($test);
      

        in fact the problem is i want to remove all styles embeded in html tags.

          alirezaok;10889747 wrote:

          in fact the problem is i want to remove all styles embeded in html tags.

          I thought you were referring to the MS-proprietary characters Word uses for things like "smart quotes".

          I suppose you could probably use preg_replace(), but I'd need some specific examples of the sort of thing that needs to be removed.

            this is one part of html:

            <TR style="mso-yfti-irow: 2; mso-yfti-lastrow: yes">
            <TD 
            style=" PADDING-BOTTOM: 0in; BORDER-LEFT: windowtext 1pt solid; WIDTH: 2.05in; PADDING-TOP: 0in; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" 
            vAlign=top width=197>
            <P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT 
            face="Times New Roman">jkl<o:p></o:p></FONT></P></TD>
            <TD 
            
            STYLE="WIDTH: 2.05in; PADDING-TOP: 0in; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" 
            vAlign=top width=197>
            <P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT 
            face="Times New Roman">jkl<o:p></o:p></FONT></P></TD>
            <TD 
            stYle="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #ece9d8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0in; BORDER-LEFT: #ece9d8;  BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" 
            vAlign=top width=197>
            <P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT 
            face="Times New Roman">&nbsp;</FONT></o:p></P></TD></TR>
            

            i want to remove all style sheet embedded it it (removing all style="......")

            (notice: some style are with CAPITAL LETTER. i want to remove them too. all styles must be remove)

              $filtered = preg_replace('#style="[^"]*"#i', '', $text);
              

                is there somewhere that explains all the different characters and patterns search that can be done with preg_replace()? ...all this stuff: '#style="["]*"#i'

                php.net/preg_replace didn't seen to have much,

                  Hm... remove all style attributes from an HTML document...

                  Hoping the document is reasonably well-formed HTML...

                  $doc = new DOMDocument();
                  $doc->loadHtmlFile('doc.html');
                  $xpath = new DOMXPath($doc);
                  foreach($xpath->query('//*[@style]') as $element)
                  {
                  	$element->removeAttribute('style');
                  }
                  $doc->saveHtmlFile('doc_without_styles.html');
                  
                    Write a Reply...