Hi
I'm buidling a site in several languages and using the UTF-8 encoding to display all the characters properly.
This works very nicely except when i use substr() and it cuts off just after an accented letter
example :
I have this text in the DB :
"Bienvenue à l’ESMA Aviation Academy !
Vous avez choisi d’entrer dans le monde aéronautique, monde de passion (...)"
when i do
<?php echo substr($str,0,100); ?>
the text cuts off just after the "é" in "aéronautique" but the "é" itself is displayed as a rectangle. I couldn't work out why this was happening as all the other accented characters were displaying properly
however i found that by lengthening the substr() by 1 character
<?php echo substr($str,0,101); ?>
the "é" character displays properly !!
so i did some research and discovered that Unicode is a lot more complex than i thought and has letters that are assembled from two or more characters even though they only display as a single character, and this would explain the substr() anomaly.
but I don't know what i can do to ensure that the substr() wouldn't cut a composite character in the middle .....
the only thing i can think of is to somehow make a substr()-like function that would not cut words in the middle but would continue to the end of the word and cut in the next space
but i haven't got a clue how to go about doing that
could someone suggest a method that i could use ?
thanks