textfield, special characters and str_replace()

ricod

:queasy: The situation :
I'm working on a site where they want to add a new article once a month or so. They want to be able to do so without my help, but noone knows html, so let alone php (well, I don't know it either really) and Flash.

My idea was that they could write the page as a textfile and then I'd like PHP add some basic markup.

:xbones: My problem :

While doing a test with str_replace() I noticed that double quotes and opening (147) and closing (148) double quotes are different characters. They'll probably be using Windows' notepad and I assume ANSI encoding. Hopefully, as UTF written with Notepad gets a few extra characters added I don't know how to handle either. But since by default it's set to ANSI ... I think I saw a thread on that while searching for an answer on that one, so I'll look into that later, when I'm a bit more knowledgable (in 20 years or so 😉 ).

I found this list on the internet, but I'm not sure what to do with it. Can someone please tell me if I can convert these characters and if so, how (told to me in terms I, as a total noob, would understand) ? Any help will be much appreciated ! 🙂

ricod

I found this post in the manual and it sounds to do what I want, but I can't get it to work :

Latin1 (iso-8859-1) DONT define chars \x80-\x9f (128-159),
but Windows charset 1252 defines some of them
-- like the infamous msoffice 'magic quotes' (\x92 146).
Dont use those invalid control chars in webpages,
but their html (unicode) entities. See ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
or http://www.microsoft.com/typography/unicode/1252.htm
PS: a '?' in the code means the win-cp1252 dont define the given char.

$badlatin1_cp1252_to_htmlent = 
  array( 
   '\x80'=>'&#x20AC;', '\x81'=>'?', '\x82'=>'&#x201A;', '\x83'=>'&#x0192;', 
   '\x84'=>'&#x201E;', '\x85'=>'&#x2026;', '\x86'=>'&#x2020;', '\x87'=>'&#x2021;', 
   '\x88'=>'&#x02C6;', '\x89'=>'&#x2030;', '\x8A'=>'&#x0160;', '\x8B'=>'&#x2039;', 
   '\x8C'=>'&#x0152;', '\x8D'=>'?', '\x8E'=>'&#x017D;', '\x8F'=>'?', 
   '\x90'=>'?', '\x91'=>'&#x2018;', '\x92'=>'&#x2019;', '\x93'=>'&#x201C;', 
   '\x94'=>'&#x201D;', '\x95'=>'&#x2022;', '\x96'=>'&#x2013;', '\x97'=>'&#x2014;', 
   '\x98'=>'&#x02DC;', '\x99'=>'&#x2122;', '\x9A'=>'&#x0161;', '\x9B'=>'&#x203A;', 
   '\x9C'=>'&#x0153;', '\x9D'=>'?', '\x9E'=>'&#x017E;', '\x9F'=>'&#x0178;' 
  ); 
$str = strtr($str, $badlatin1_cp1252_to_htmlent);

Which I use like this :

	$ft = 'test.txt';
	$handle = fopen($ft, "rb");
	$tc = fread($handle, filesize($ft));
	fclose($handle);

$str2 = strtr($tc, $badlatin1_cp1252_to_htmlent); 
print $str2;

The frustrating thing is, it worked for 5 seconds or so (I made it all 1 line) and then I thought "Hey, I wonder if I can get the function to be more readable by adding a few 'enters' here and there (after comma's)", to find out it no longer reads it properly. A couple of undo's later oughto bring it back to the original state, but no. It won't work anymore. 🙁

Anyone has an idea what's going on ?

maxpup979

what you may want to do is just build a couple of forms for your clients to use to post the articles. This is especially useful if the articles all have the same basic structure. And it is a lot easier than what I think you are trying to do.

ricod

The basic markup part really isn't all too hard. It's just the special characters that are messing up. They want to use those. They're very precise and two double quotes just aren't an opening and closing quote.

Thanks for the input though. But making input forms would require adding some write functionality right ? I heard that involves some security issues and I'm not quite there yet.

maxpup979

depends on how you want to handle the forms. I myself would probably not write it to a file, unless I had no other choice. I prefer storing the info in a DB. I think the problem you are going to have is that if these folks want to display a special character set, the browser that accesses that page must also have the special character set installed. If these characters are simple ANSI, then there shouldne be a problem. But then again, its way too early, and I am way to decaffinated right now...🙂

ricod

😉 I know that situation ...

I'm afraid that learning SQL atm is a bit overkill. Maybe I can suggest it and then they can hire someone else for that. But I think special characters and SQL is also a bit of a problem from what I've read so far (pretty much by now, about 40 hours or so of reading up ... too bad I'm not getting paid for this).

The special characters are only ANSI like opening and closing double quotes, percent sign, euro sign, accented letters etc. No multibyte characters. I need to read up on that at a later stage as I'm working on a (Japanese) kanji site as a personal project, but that's a load of headache I'll safe for later ,when I can buy some new aspirin. :rolleyes: