Hi all.
There's another question about encodings... I'll try to explain in a few words:
I'm sending raw data from Flash app to this PHP page. The data consists of 2 parts:
- header telling which encoding the body uses
- body (which is basically a text/xml file encoded using some not- UTF-8 encoding.

What I want to do:
I want to convert the body to UTF-8 string and append it to the DOMDocument instance.

Header recognition works fine. Convertion works also fine (see the comments along the code)

But, appending converted string to DOMDocument gives &#XXXX; for unicode letters which are not compliant with regular ASCII. 🙁

The question is: What do I need to do to make DOMDocument::loadXML() write UTF-8 string and not a ASCII substitution for it? (PHP files are saved in UTF-8 encoding)

<?php
/**
 * The file is saved with UTF-8 encoding
 * and, while editing it I can use 
 * foreign language characters
 */
mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
mb_http_output("UTF-8");

class Receiver extends DOMDocument {
	public static $US_ASCII = "us-ascii";
	public static $US_ASCII_PHP = "cp1252";

public static $UTF_8 = "utf-8";
public static $UTF_8_PHP = "UTF-8";

public static $ISO_8859_5 = "iso-8859-5";
public static $ISO_8859_5_PHP = "ISO-8859-5";

public static $KOI8_R = "koi8-r";
public static $KOI8_R_PHP = "KOI8-R";

public static $WINDOWS_1251 = "windows-1251";
public static $WINDOWS_1251_PHP = "Windows-1251";

public $src;
public $enc;

function __construct() {
	parent::__construct("1.0", Receiver::$UTF_8_PHP);
	if(isset($GLOBALS["HTTP_RAW_POST_DATA"])){
		/**
		 * The data consists of 2 chunks:
		 * chunk 1 (20 bytes) represents
		 * following data encoding and is allways 
		 * encoded using UTF-8.
		 * 
		 * chunk 2 contains data with custom encoding
		 * in my example it's windows-1251
		 */
		$ienc = mb_substr($GLOBALS["HTTP_RAW_POST_DATA"], 0 , 20, Receiver::$UTF_8_PHP);
		$ienc = trim($ienc);
		switch($ienc) {
			case Receiver::$UTF_8:
				$this->enc = Receiver::$UTF_8_PHP;
				break;
			case Receiver::$US_ASCII:
				$this->enc = Receiver::$US_ASCII_PHP;
				break;
			case Receiver::$ISO_8859_5:
				$this->enc = Receiver::$ISO_8859_5_PHP;
			case Receiver::$KOI8_R:
				break;
			case Receiver::$WINDOWS_1251:
				$this->enc = Receiver::$WINDOWS_1251_PHP;
				break;
			default:
				exit("unrecognised encoding " . $ienc);
		}
		$lnt = mb_strlen($GLOBALS["HTTP_RAW_POST_DATA"], $ienc) - 20;
		$this->src = mb_substr($GLOBALS["HTTP_RAW_POST_DATA"], 20, $lnt, $this->enc);
		//$unpacked = unpack("C" . $lnt . "chars", $this->src);
		//var_dump($unpacked);
		//$this->src = join("", $unpacked);
		$this->src = iconv($this->enc, Receiver::$UTF_8_PHP, $this->src);
		$bom = "\xEF\xBB\xBF";
		$xml = fopen("log.xml", "w+");
		fwrite($xml, $bom . $this->src);
		/**
		 * The saved file is encoded in UTF-8 and looks fine
		 * all the characters converted as they should
		 */
		$this->loadXML($this->src);
		/**
		 * But, when I try to parse the same text
		 * into DOMDocument it will insert these characters:
		 * &#x442;&#x435;&#x43A;&#x441;&#x442; &#x43F;&#x43E [...];
		 * instead of:
		 * &#1090;&#1077;&#1082;&#1089;&#1090; &#1087;&#1086;-&#1088;&#1091;&#1089;&#1089;&#1082;&#1080;.
		 * Hovever, they will be displayed in browser
		 * the same way as the above Cyrilic text
		 * I would like the XML file to be also 
		 * human-readable...
		 */
	} else {
		$nodata = $this->appendChild($this->createElement("noData"));
		$nodata->appendChild($this->createTextNode("No data posted!"));
	}
}
/**
 * @return string
 */
function toString() {
	$this->formatOutput = true;
	var_dump($this);
	// this line will output gibberish :(
	return $this->saveXML();
	// this line will output the xml as it should look
	//return urldecode($this->src);
}
}
?>

Here's the AS I'm using to send the test page:

package
{
	import flash.display.Sprite;
	import org.wvxvw.phputils.XMLSender;

public class Main extends Sprite
{
	public var xs:XMLSender;
	public var testXML:XML = 
	<xml>
		<English>
		English text
		</English>
		<Russian>
		&#1090;&#1077;&#1082;&#1089;&#1090; &#1087;&#1086; - &#1088;&#1091;&#1089;&#1089;&#1082;&#1080;
		</Russian>

	</xml>;

	public function Main():void
	{
		xs = new XMLSender();
		xs.sendXML(testXML, 'http://localhost/flashtest/flashreceiver.php', XMLSender.WINDOWS_1251);
	}
}
}
/**
* ...
* @author wvxvw
*/
package  org.wvxvw.phputils
{
	import flash.events.EventDispatcher;
	import flash.events.Event;
	import flash.events.IOErrorEvent;
	import flash.events.SecurityErrorEvent;
	import flash.net.URLLoader;
	import flash.net.URLRequest;
	import flash.net.URLLoaderDataFormat;
	import flash.net.URLRequestMethod;
	import flash.net.URLRequestHeader;
	import flash.utils.ByteArray;
	import flash.utils.Endian;

public class XMLSender extends EventDispatcher
{
	public static const US_ASCII:String = 'us-ascii';
	public static const UTF_8:String = 'utf-8';
	public static const ISO_8859_5:String = 'iso-8859-5';
	public static const KOI8_R:String = 'koi8-r';
	public static const WINDOWS_1251:String = 'windows-1251';

	private var ur:URLRequest;
	private var ul:URLLoader;
	private var ba:ByteArray;
	private var xml:XML;
	private var encodedString:String;

	public function XMLSender() 
	{
		super();
		ul = new URLLoader();
		ul.dataFormat = URLLoaderDataFormat.BINARY;
		ul.addEventListener(Event.COMPLETE, handleComplete);
		ul.addEventListener(IOErrorEvent.IO_ERROR, handleIOError);
		ul.addEventListener(SecurityErrorEvent.SECURITY_ERROR, handleSecurityError);
	}
	public function sendXML(source:Object, url:String, encoding:String = UTF_8, endian:String = Endian.LITTLE_ENDIAN):void
	{
		try
		{
			xml = new XML(source);
		} catch (e:Error) {
			trace('unable to convert to XML');
		}
		encodedString = xml.toXMLString();
		ur = new URLRequest(url);
		var h:URLRequestHeader = new URLRequestHeader('Content-Type', 'application/octet-stream');
		ur.method = URLRequestMethod.POST;
		ur.requestHeaders.push(h);
		ba = new ByteArray();
		ba.endian = endian;
		trace(encodedString);
		ba.writeUTFBytes(encoding);
		ba.position = 20;
		ba.writeMultiByte(encodedString, encoding);
		ba.position = 0;
		ur.data = ba;
		ul.load(ur);
	}
	public function handleComplete(evt:Event):void
	{
		trace('complete', ul.data);
	}
	public function handleIOError(evt:Event):void
	{
		trace('IO', evt);
	}
	public function handleSecurityError(evt:Event):void
	{
		trace('security', evt);
	}
}

}

Every input from you is more than welcomed =)

OK... finally, I've found some workaround... but it looks allmost wrong to me... =/

$sxml = simplexml_load_string($this->src);
$dxml = $this->importNode(dom_import_simplexml($sxml), true);
$this->appendChild($dxml);

I.e. creating SimpleXMLElement form the string first, and than converting it to DOMDocumentFragmet finally made DOMDOcument append the node not converting it's content... So, it sort of works now, but, anyway, it'd be great if someone could explained why DOMDocument wouldn't do the same thing without all the SimpleXML convertions...

    Write a Reply...