I've got a database and some content editing tools for a site using XHTML (transitional)...

My problem is the use of special characters (trademark, registered, copyright, degrees). I can have my editors use the proper character entities when inserting/updating data, but how do I ensure that those values are kept in the forms when they are outputted back to the browser.

For example, an editor adds an attribute that requires the copyright symbol and uses the correct &# code. But when the product is viewed again, the browser interprets that code into the actual symbol. Therefore, when it gets updated again, the database recieves the symbol rather than the code.

Normally this would not be an issue, but XHTML is not forgiving.

Any ideas on how to preserve the &# codes throughout the entire process?

    So what is the difference between htmlentities() and htmlspecialchars()?

    Supposedly, htmlentities() translates "all" HTML character entities, but what doe that mean - even normal letters and numbers?

    Also, I looks like htmlspecialchars() translates the characters into the &#name; rather than the &#nnn;

    XHTML does not like amp, deg, copy, reg and requires you use 038, 176, 169, 174 respectively.

      Make your own conversion functions: one big array (char => xhtml value) + search & replace.

        I'm trying the following, but it outputs the exact same string? I'm using PHP4, but don't see what the problem is...

        $str = "This is a © example & it is meant to test if this dämn this wo®ks."
        $strA = htmlentities($str);
        $strB = htmlspecialchars($str);

        echo $str . "<br><br>";
        echo $strA . "<br><br>";
        echo $strB . "<br><br>";

          it is working, you just won't see it in your browser. run that same code and then "view source" in your browser.

            I have a similar problem because I'm using a CMS and trying to get it to spit out XHTML. Formatting break tags, and ordered lists isn't too hard, but HTML Entities are kicking my butt.

            I've been using a modification of the code found in the php manual. Everything was working fine till recently. I dunno if my host upgraded or something... but it's PHP 4.3.4 and MySQL 4.0.18.

            What happened recently is the "&" getting screwed up.

            "'&(?!amp);?'i",

            is what I've had for a while now...

            This is all screwy sounding (need to get some sleep) but when I check the sql update statement before it gets put in the database it spits out the HTML Entities correctly, like it always used to do. But what shows up in the database and on the result page is

            &amp ;&#112 ; *spaces to preserve text

            Has anyone else seen something like this? I'll post some more info in the morning, I'd like to get to the bottom of this as it's irritating to have it pull out all the other extra characters, and not the & so my page doesn't validate... Thanks for any help everyone. 🙂 g'night

              XHTML does not like amp, deg, copy, reg and requires you use 038, 176, 169, 174 respectively

              hmm... are you sure?
              Where in the XHTML specification does it say that?

                Originally posted by katiliosk
                So what is the difference between htmlentities() and htmlspecialchars()?

                Supposedly, htmlentities() translates "all" HTML character entities, but what doe that mean - even normal letters and numbers?

                Y'know, the obvious thing to do here would be to try them and see what effect they have. "Character entities" in this case refers to those listed in the HTML specification, I think.

                XHTML does not like amp, deg, copy, reg and requires you use 038, 176, 169, 174 respectively.

                The XHTML specification says it does.

                http://www.w3.org/TR/xhtml1/#h-A2
                The XHTML entity sets are the same as for HTML 4, but have been modified to be valid XML 1.0 entity declarations. Note the entity for the Euro currency sign (&amp;euro; or &amp;#8364; or &amp;#x20AC😉 is defined as part of the special characters.

                Where the HTML 4 character entities are of course listed in the corresponding specification.

                Besides, &amp;amp; is valid in all dialects of XML.

                  So I promised this morning, it's now almost Sunday... oops...

                  First off I have to agree, XML has no problem with &amp;, in fact I've been using it without problems for quite a while. I prefer # codes myself.

                  But I have some code to spit out and see if anyone can pinpoint my problem. For starters lets get the function that swaps the code... Pretty generic preg_replace

                  function seaReplace($temp) {
                  		$search = array ("'<script[^>]*?>.*?</script>'si",  // Strip out javascript
                                   "'&(?!amp);?'i",
                                   "'&(lt|#60);'i",
                                   "'&(gt|#62);'i",
                  				 "'—|&(mdash);'i",
                  				 "'–|&(ndash);'i",
                  				 "'¤|&(curren);'i",
                  				 "'°|&(deg);'i",
                  				 "'®|&(reg);'i",
                  				 "'™|&(trade);'i",
                  				 "'&(#8756|there4);'i",
                  				 "'…|&(#8230|hellip);'i",
                                   "'¡|&(iexcl|#161);'i",
                                   "'¢|&(cent|#162);'i",
                                   "'£|&(pound|#163);'i",
                  				 "'€|&(euro);'i",
                                   "'©|&(copy|#169);'i",
                  				 "'&rsquo;|&(rsquo);'i",
                  				 "'«|&(ldquo);'i",
                  				 "'»|&(rdquo);'i",
                  				 "'É|&(Eacut);'i",
                  				 "'•|&(bull);'i",
                  				 "'é|&(eacute);'i");
                  
                  	$replace = array ("",
                                "&amp;",
                                "<",
                                ">",
                  			  "&#8212;",
                  			  "&#8211;",
                  			  "&#164;",
                  			  "&#176;",
                  			  "&#174;",
                  			  "&#8482;",
                  			  "&#8756;",
                  			  "&#8230;",
                                "&#161;",
                                "&#162;",
                                "&#163;",
                  			  "&#8364;",
                                "&#169;",
                  			  "&#8217;",
                  			  "&#8220;",
                  			  "&#8221;",
                  			  "&#201;",
                  			  "&#8226;",
                  			  "&#233;");
                  
                  	$var = preg_replace ($search, $replace, $temp);	
                  
                  return $var;
                  	}

                  And this is the code that calls a function to do the slashes for quotes and that function...

                  else {
                  					$cleanVal = $this->cleanFields($val,$key);					
                  					$sqlUpdate = "UPDATE " . $this->tabName . " SET " . $key . "='$cleanVal' WHERE id=" . $_POST['id'] . "";
                  					//print ($sqlUpdate . "<br />\n");
                  		    		mysql_query($sqlUpdate,$GLOBALS['db']) or die (mysql_error());
                  				}

                  And this is what I see on the screen...

                  <blockquote><p>If all else fails, tell the truth&#8230;</p></blockquote><p>This is from <a href=\"http://www.amazon.com/exec/obidos/tg/detail/-/0751501999/qid=1082121101/sr=1-5/ref=sr_1_5/102-5900828-8304145?v=glance&amp;s=books\">Stark</a> by Ben Elton. I borrowed this from my <a href=\"http://recently.rainweb.net/hive/215/\" title=\"Greetings from Ireland\">distant cousins in Ireland</a>. It\'s a good book, I\'m just having a hard time getting engrossed in it where I read for 6 hours straight and I\'ve been trying to read it for almost a year.</p>' WHERE id=504

                  But what actually is in the database is...

                  <blockquote><p>If all else fails, tell the truth&amp;#8230;</p></blockquote><p>This is from <a href="http://www.amazon.com/exec/obidos/tg/detail/-/0751501999/qid=1082121101/sr=1-5/ref=sr_1_5/102-5900828-8304145?v=glance&amp;s=books">Stark</a> by Ben Elton. I borrowed this from my <a href="http://recently.rainweb.net/hive/215/" title="Greetings from Ireland">distant cousins in Ireland</a>. It's a good book, I'm just having a hard time getting engrossed in it where I read for 6 hours straight and I've been trying to read it for almost a year.</p>

                  Notice the extra &amp; tacked on the ellipsis, & amp;#8230;

                  I could understand if it just went through the whole thing twice, but I'm spitting out the sql statement going in. And it doesn't seem to end up exactly the way it was supposed to. Any help would be greatly appreciated. 🙂 Thanks everyone.

                    Write a Reply...