davidmorley;11058951 wrote:To pbismad : I see you are correct because $y=strlen($amount); yields $y as 7 when it should be six.
You might look into using the multibyte string functions like [man]mb_strlen[/man] and this may give you the result you want for now, but you should be aware that any string of numbers & letters assumes some Character Encoding. One man's £ is another man's \xc2\xa3.
davidmorley;11058951 wrote:This leads me into stuff that I don't know. I am developing using W10 with the latest wampstack. I do not know how to set things
so that all my strings are multibyte. Then, later, when I move the project to the external server am I using "my" PHP or
"their" PHP ?
This is not so much an us vs them thing. Anything that stores text really just contains 1's and 0's. If you want to know what letters and numbers those 1's and 0's represent, you have to know what charset was used to encode them. All text manipulated by computers assume some charset:
text files
php files
database tables
html pages
* emails
If you've ever had trouble with some other person's text where there are a bunch of funny characters, that is probably due to a failure at some stage to interpret text using the correct character set.
davidmorley;11058951 wrote:I am wondering whether to save $amount in the database table just as the customer wrote it : either £75.34 or 75.34. Maybe it
would be easier to shed the £ sign inside the dbt ?
This is a very helpful example which may shed some light on the issue you are facing. If you drop the £ and just store the digits, what happens when some other person comes along and reads amount and thinks "wow this stuff is cheap!" because they assume that amount represents US Dollars ($). It should be clear that when you store data, you make assumptions about what you are encoding. Similarly, if I cram the entire text of War & Peace (the original Russian manuscript) into a text file or database encoded as cyrillic characters, then when someone opens it up expecting the English translation encoded as ASCII text, they are going to be sorely disappointed.
I am lucky that Weedpacket spent a fair amount of time helping me understand character encoding in this thread. It can take some effort to get your head around. The basic idea is that if you receive text (via email, text file, database query, etc.) then that text has been encoded into zeros and ones using a particular charset and you must be mindful of that charset if you want to do anything to the text. Think about each of these steps:
user requests a web page
apache, before sending web page sends a header specifying the charset:
Content-Type: text/html; charset=UTF-8
user's browser receives the web page, which includes a charset declaration:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
* user submits text into a form on web page. that form has an accept-charset attribute:
<form method="post" accept-charset="UTF-8">
some PHP script receives the form's submission via $_POST and wants to check its length, parse it, etc. USE THE MB CHARACTER FUNCTIONS!
PHP script wants to insert the text into a database. MAKE SURE THE DB USES UTF-8 CHARACTER ENCODING.
As you can see, it can be tricky. I just try to remember that true ASCII only has 127 characters so any fancy chars, including £, will probably require a fancier character set. That being the case, I just try to make sure my text is always encoded as UTF-8, Everywhere. All the time.