PHP STRING FUNCTIONS

davidmorley

I am still having trouble with PHP string functions seemingly not doing what the Manual says.

For example, I want to get rid of the £ sign in front of $amount which is £72.78.

I have at the top of the file

<?php
ini_set('display_errors',true);
error_reporting(E_ALL);
$new_amount="";

Then further down :

$first_character=mb_substr($amount,0,1);  

// Gets the first character
echo $first_character;
echo nl2br("\n\n");
//$first_character echos a pound sign at this point; so far so good

if ($first_character == "£") {
   $new_amount=str_replace("£"," ",$amount); }
echo $new_amount;

Nothing appears at echo $new_amount; and no error is shown.

Instead of " " I tried "X" and "". Then I tried the same variants with str_ireplace( )

I am blessed if I can see that I haven't written exactly what it says in the Manual.

pbismad

The problem is most likely that the data in $amount is multi-byte encoded (the £ is C2A3 hex), but the £ you have typed in your code for the comparison and the str_replace() is a single-byte ASCII character, and is just A3 hex. What character encoding is your .php file? It will need to be the same as the data or you will need to do a conversion in the code to get the values into the same character encoding.

NogDog

Do you know for sure that at the point you do the str_replace() that $amount actually has a value? (I guess if not, you'd just see a comma being output?)

davidmorley

To Nogdog : I echo'd $amount higher up in the code and there was a number like £75.34 there

To pbismad : I see you are correct because $y=strlen($amount); yields $y as 7 when it should be six.

This leads me into stuff that I don't know. I am developing using W10 with the latest wampstack. I do not know how to set things
so that all my strings are multibyte. Then, later, when I move the project to the external server am I using "my" PHP or
"their" PHP ?

I am wondering whether to save $amount in the database table just as the customer wrote it : either £75.34 or 75.34. Maybe it
would be easier to shed the £ sign inside the dbt ?

Thank you both for your help.

Derokorian

I mean, if you just want to remove non-numeric characters, you could do somethingl ike this, it will work with multibyte string (in my experience):

$value = "£75.34";
$new_value = preg_replace('/[^0-9\.-]/','', $value);
var_dump($new_value);

sneakyimp

davidmorley;11058951 wrote:
To pbismad : I see you are correct because $y=strlen($amount); yields $y as 7 when it should be six.

You might look into using the multibyte string functions like [man]mb_strlen[/man] and this may give you the result you want for now, but you should be aware that any string of numbers & letters assumes some Character Encoding. One man's £ is another man's \xc2\xa3.

davidmorley;11058951 wrote:
This leads me into stuff that I don't know. I am developing using W10 with the latest wampstack. I do not know how to set things
so that all my strings are multibyte. Then, later, when I move the project to the external server am I using "my" PHP or
"their" PHP ?

This is not so much an us vs them thing. Anything that stores text really just contains 1's and 0's. If you want to know what letters and numbers those 1's and 0's represent, you have to know what charset was used to encode them. All text manipulated by computers assume some charset:
text files
php files
database tables
html pages
* emails

If you've ever had trouble with some other person's text where there are a bunch of funny characters, that is probably due to a failure at some stage to interpret text using the correct character set.

davidmorley;11058951 wrote:
I am wondering whether to save $amount in the database table just as the customer wrote it : either £75.34 or 75.34. Maybe it
would be easier to shed the £ sign inside the dbt ?

This is a very helpful example which may shed some light on the issue you are facing. If you drop the £ and just store the digits, what happens when some other person comes along and reads amount and thinks "wow this stuff is cheap!" because they assume that amount represents US Dollars ($). It should be clear that when you store data, you make assumptions about what you are encoding. Similarly, if I cram the entire text of War & Peace (the original Russian manuscript) into a text file or database encoded as cyrillic characters, then when someone opens it up expecting the English translation encoded as ASCII text, they are going to be sorely disappointed.

I am lucky that Weedpacket spent a fair amount of time helping me understand character encoding in this thread. It can take some effort to get your head around. The basic idea is that if you receive text (via email, text file, database query, etc.) then that text has been encoded into zeros and ones using a particular charset and you must be mindful of that charset if you want to do anything to the text. Think about each of these steps:
user requests a web page
apache, before sending web page sends a header specifying the charset:

Content-Type: text/html; charset=UTF-8

user's browser receives the web page, which includes a charset declaration:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

* user submits text into a form on web page. that form has an accept-charset attribute:

<form method="post" accept-charset="UTF-8">

some PHP script receives the form's submission via $_POST and wants to check its length, parse it, etc. USE THE MB CHARACTER FUNCTIONS!
PHP script wants to insert the text into a database. MAKE SURE THE DB USES UTF-8 CHARACTER ENCODING.

As you can see, it can be tricky. I just try to remember that true ASCII only has 127 characters so any fancy chars, including £, will probably require a fancier character set. That being the case, I just try to make sure my text is always encoded as UTF-8, Everywhere. All the time.

davidmorley

OK That seems to work. Thank you.

davidmorley

OK This is very helpful. I'll study it and look at the thread you mentioned.

NogDog

If everything will be in one currency, I would just just store the numeric amount. If you need to support multiple currency types, I would then use 2 columns: one for the numeric amount, and the other for the currency type. The latter might even be a foreign key to a static "currency_type" table.