I'm trying to do some validation on user names for my script.
Basically; i'd like to allow unicode usernames. Any language letter/number and underscore with some formatting rules such as no username starting with a number or underscore.
I started out with a basic regular expression to just match any letter character (No punctuation or symbols .. just letters) in any language.
Using the //u to make the preg functions unicode aware.
Using the \p{L} which is a grouping method for unicode.
Code:
preg_match('/^[\p{L}]+$/u','TEXT HERE');
This seems to work good for most languages that i've tested..
English 'test' ok
German 'Prüfung' ok
Russian 'испытание' ok
Norwegian 'prøve' ok
Japanese 'テスト' ok
Chinese Simplified '试验' Doesn't work
Chinese Traditional '試驗' Doesn't work
If anyone can explain to me why the Chinese dialects are not being picked up by this regular expression -- that would be great - Dan