Hi
How can I remove Microsoft Word formatting from text (using PHP) when submitting a web form?
any one have one function for this ?
Hi
How can I remove Microsoft Word formatting from text (using PHP) when submitting a web form?
any one have one function for this ?
Use a better web editing program then Word, and you could remove the word....
dreamwaver has better form tools then Office.
alirezaok;10889741 wrote:Hi
How can I remove Microsoft Word formatting from text (using PHP) when submitting a web form?
any one have one function for this ?
If outputting UTF-8 content, this seems to work well:
<?php
function filterText($text)
{
$search = array (
'&',
'<',
'>',
'"',
chr(212),
chr(213),
chr(210),
chr(211),
chr(209),
chr(208),
chr(201),
chr(145),
chr(146),
chr(147),
chr(148),
chr(151),
chr(150),
chr(133)
);
$replace = array (
'&',
'<',
'>',
'"',
'&quot;',
'&#8216;',
'&#8217;',
'&#8220;',
'&#8221;',
'&#8211;',
'&#8212;',
'&#8230;',
'&#8216;',
'&#8217;',
'&#8220;',
'&#8221;',
'&#8211;',
'&#8212;',
'&#8230;'
);
return str_replace($search, $replace, $text);
}
$test = <<<END
This here’s a test. “It is only a test.” ‘The end.’
END;
header('Content-Type: text/html; charset="UTF-8"');
echo filterText($test);
in fact the problem is i want to remove all styles embeded in html tags.
alirezaok;10889747 wrote:in fact the problem is i want to remove all styles embeded in html tags.
I thought you were referring to the MS-proprietary characters Word uses for things like "smart quotes".
I suppose you could probably use preg_replace(), but I'd need some specific examples of the sort of thing that needs to be removed.
this is one part of html:
<TR style="mso-yfti-irow: 2; mso-yfti-lastrow: yes">
<TD
style=" PADDING-BOTTOM: 0in; BORDER-LEFT: windowtext 1pt solid; WIDTH: 2.05in; PADDING-TOP: 0in; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt"
vAlign=top width=197>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT
face="Times New Roman">jkl<o:p></o:p></FONT></P></TD>
<TD
STYLE="WIDTH: 2.05in; PADDING-TOP: 0in; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt"
vAlign=top width=197>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT
face="Times New Roman">jkl<o:p></o:p></FONT></P></TD>
<TD
stYle="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #ece9d8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0in; BORDER-LEFT: #ece9d8; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt"
vAlign=top width=197>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT
face="Times New Roman"> </FONT></o:p></P></TD></TR>
i want to remove all style sheet embedded it it (removing all style="......")
(notice: some style are with CAPITAL LETTER. i want to remove them too. all styles must be remove)
$filtered = preg_replace('#style="[^"]*"#i', '', $text);
is there somewhere that explains all the different characters and patterns search that can be done with preg_replace()? ...all this stuff: '#style="["]*"#i'
php.net/preg_replace didn't seen to have much,
From the manual:
PCRE regular expression syntax
PCRE regular expression modifiers
I learned a lot of it in my previous experience using Perl, so I'm sure there are a lot of Perl regular expression references out there that could help, too.
Hm... remove all style attributes from an HTML document...
Hoping the document is reasonably well-formed HTML...
$doc = new DOMDocument();
$doc->loadHtmlFile('doc.html');
$xpath = new DOMXPath($doc);
foreach($xpath->query('//*[@style]') as $element)
{
$element->removeAttribute('style');
}
$doc->saveHtmlFile('doc_without_styles.html');