Imagettftext won't render a UTF8 character

cadev · Feb 5, 2013

Hello, I've come across a problem that I can't solve.
I'm using imagettftext to render some strings. My strings are UTF8 encoded, and I can successfully render pretty quotation marks etc using TTF Helvetica.
The problem is when I try to render a wide line (character #8213) it renders as a box - the generic fail box. The php docs say straight UTF8 encoded strings can be sent in directly as of php 5.0 (using 5.3.3).... Is that character not supported by the FreeType engine? Help!

cadev · Feb 7, 2013

Can anyone help?

sneakyimp · Feb 7, 2013

I wonder if TTF Helvetica has glyphs for any possible UTF8 character? I suspect it may not. A quick search for "utf8 fonts" yielded this post which states:

As Brian Neal stated in his comments, UTF-8 is just one of several encoding standards that can represent every character in the Unicode character set, which currently contains more than 100000 entries.

So you are actually asking for a true type font that supports all 100k+ unicode characters currently in use on this planet (and it's a moving target, as the set gets expanded and adjusted constantly).

So I guess the literal answer is no, and you should probably check your preconditions (what character subsets are likely encountered in your use case), and search for a fitting multi-purpose font.

However, there are attempts to provide fonts that cover large amounts of the unicode space - search for 'Pan-Unicode Fonts' to get an overview.

One possibility might be to specify a font suitable for the string's original language from a selection of language-specific fonts. Another possiblity would be to change the string encoding to something else like Latin-1 or ASCII, although the obvious result would be to lose your ability to render certain non-ascii and/or non-latin-1 glyphs. See [man]mb_convert_encoding[/man].

cadev · Feb 8, 2013

Sneaky, thank you! That's very helpful, I'll follow that trail and see where it leads.

sneakyimp · Feb 8, 2013

Glad to help. The basic idea is that UTF8 can encode bazillions of characters (English, Arabic, Russian, Chinese, Japanese, et. al) and the likelihood of finding some magic font that has a nice pretty glyph for every possible letter/symbol/punctuation in every language is unlikely.

Do please let us know what you find out? I'd be interested in knowing what sort of information you find.

Weedpacket · Feb 8, 2013

Any decent font viewer should let you see every character in a given font (for those of you following at home, cadev did imply that this question is about character U+2015 HORIZONTAL BAR, used as a quotation dash; as distinct from an en-dash or em-dash, which any Helvetica font would have. Oh, and not U+8213, which is the Han character meaning "to lick").

Also, how precisely are you supplying the character? If it is as a literal "―" in a string, for example, then it will be encoded using the same encoding that the rest of the file uses (as determined by your editor).

sneakyimp · Feb 8, 2013

Weedpacket;11023757 wrote:
Oh, and not U+8213, which is the Han character meaning "to lick").

(舓 )We need an English glyph for that.

Weedpacket;11023757 wrote:
Also, how precisely are you supplying the character? If it is as a literal "―" in a string, for example, then it will be encoded using the same encoding that the rest of the file uses (as determined by your editor).

Weedpacket asks an important question. There is usually a custody chain in the lifetime of a character. For example
1) User visits your website and downloads your comment entry form using their browser. This entry form declares a particular character encoding. Does the form have an accept attribute? Were accept-encoding headers exchanged?
2) User types 舓 or the horizontal bar into the entry form and clicks submit. I'm not sure, but I believe the browser handles this process based on the char encoding declarations in the page and any headers sent by the server hosting the input form.
3) Something on the server stores the code. Is it a database? If so, these also have character encodings associated with their client connection and also with their storage format.

4) When your script reads this data (from file, form database, from socket?) there may also be some kind of encoding translation, etc.

As you can imagine, this custody chain can cause problems with your encoding if you aren't careful at each step.

Weedpacket · Feb 9, 2013

QUOTE=sneakyimp;11023765We need an English glyph for that.[/quote]
:p

cadev · Feb 12, 2013

I believe the character is being pasted in from an external source, not typed in.

The editor is custom and the data is being sent via XHR with UTF8 encoding. The DB is set up to store in UTF8 as well. My rendering script (php) sets the encoding of the mysql connection charset to utf8 as well. So, the character makes it all the way to the renderer and the 'imagettftext' function.

I've looked at some character maps for Helvetica and it doesn't look like this character (U+ 2015) has a glyph in the font, which is why it's being rendered as a square box.

So, here's my next tough question. Aside from creating a manual array'd character list of what my font(s) all contain, is there any other way to check a character against a font (or several fonts) to make sure it has a glyph? I'm guessing that would have to be done via shell ('system' function)?

Thanks guys

sneakyimp · Feb 12, 2013

cadev;11023879 wrote:
I believe the character is being pasted in from an external source, not typed in.

Pasted into what? A form hosted in a browser? Some piece of software? A file? The application where this data "gets pasted in" is every bit as capable of botching the data hand off as any other link in the data custody chain.

cadev;11023879 wrote:
The editor is custom and the data is being sent via XHR with UTF8 encoding.

The custom editor may or may not support a utf8 charset.

cadev;11023879 wrote:
The DB is set up to store in UTF8 as well.

Good! What about the connection you make to your database? It has a charset as well [EDIT: see next comment].

cadev;11023879 wrote:
My rendering script (php) sets the encoding of the mysql connection charset to utf8 as well.

OK wait it sounds like maybe you covered the client connection's charset then....

cadev;11023879 wrote:
So, the character makes it all the way to the renderer and the 'imagettftext' function.

One cannot be too sure about this. The ball could have been dropped at the entry stage. You might want to write these glyphs from your script that is retrieving them to a file where you can inspect them and be sure.

cadev;11023879 wrote:
I've looked at some character maps for Helvetica and it doesn't look like this character (U+ 2015) has a glyph in the font, which is why it's being rendered as a square box.

This certainly sounds to me like a show-stopper for that char. You could run a test with some other font suitable for the original data's language -- if you know what it is.

So, here's my next tough question. Aside from creating a manual array'd character list of what my font(s) all contain, is there any other way to check a character against a font (or several fonts) to make sure it has a glyph? I'm guessing that would have to be done via shell ('system' function)?

Thanks guys[/QUOTE]

Weedpacket · Feb 13, 2013

sneakyimp wrote:
This certainly sounds to me like a show-stopper for that char. You could run a test with some other font suitable for the original data's language -- if you know what it is.

One of the things provided by the site I referenced in my links earlier is a list of fonts known to the site that support a given character. If necessary, download sources can be found from the fonts' entries.

Imagettftext won't render a UTF8 character

Ccadev

Ccadev

Ssneakyimp

Ccadev

Ssneakyimp

Weedpacket

Ssneakyimp

Weedpacket

Ccadev

Ssneakyimp

Weedpacket