I've been looking for a good way to have a regex check for a name allowing also for accented letters like in 'René'

I've come up with this:
[a-zÀ-ÿ][\'a-zÀ-ÿ -]*$

Don't know if it was ok to use À-ÿ for the accented letters?

Also discovered you can use hex values in regexes. Is this better?
[a-z\xC0-\xFF][\'a-z\xC0-\xFF -]*$

    I'm no expert on regular expressions, but consider using \z/ instead of $/ the later doesn't delimit the true end of the expression. In regards to your inquiry, your second option seems to be your best bet. I never seen your first option being used or documented before.

      At least it looks more clear a code, using hex codes. But I'm right this only works when viewing the site with a charset of iso 8859-1? I've already converted all posted values to iso 8859-1 with utf8_decode. So that names like René display as René in send mail. And the index.php is set to iso 8859-1 by default. So there's no need to set it specifically in my php script?

        Can you explain this: when I have a php file which only echoes the é character, the browser defaults to iso-8859-1. When I change the browser charset to unicode it displays a question mark instead, but when I refresh the page or type in the php link again, it switches back to iso 8859 again. Even though I haven't specifically set it as such.

          I don't think php is the problem here. This looks like a web-server / browser issue. In the Apache server configuration file or in a .htaccess if not present add the following:

          AddDefaultCharset desired character set here

          Hope this helps. Good luck.

            The web server may be sending a header with a default value. To avoid ambiguity, you should always explicitly set the correct charset, which can be done via a meta tag in the head section of your [X]HTML document:

            <meta http-equiv='Content-Type' content='text/html; charset=ISO-8859-1'>
            

            I believe you could instead use a PHP header as follows, though I always use the meta tag as above.

            <?php
            header('Content-Type: text/html; charset=ISO-8859-1');
            

              Well it's not really a problem since I want everything to be iso8859 (otherwise my regex which searches also for accented letters through hexadecimal codes wouldn't probably work). Just though my hosts php server might be set to iso8859 by default. Or that all php servers assume iso8859 by default like someone told me. But thats not the case then?

                Perhaps the webserver as I already told you, not php. If you were to read the link I gave you (which stated the problem), you would see php ignores most encodings greater than one byte, as it currently only supports or asummes that each character is 8 bits. This could lead someone to say php defaults to iso8859, but in reality it is not about a particular set, but a lack of support (to some extent) of other characther sets.

                  Write a Reply...