Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-25291

QDomDocument::save wrong encoding for UTF-8 char coded on 4 bytes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Not Evaluated
    • 5.0.0
    • 4.8.0
    • XML: DOM
    • None
    • Ubuntu 11.10

    Description

      I have some problem to write a special UTF-8 char into an XML document. This special char comes from an UTF-8 encoded xml document. This char is coded on 4 bytes. I basically need to save it in another xml document.

      Depending on the method I use, I get either:

      • a string that does not represent my special char at all
      • an invalid entity
      • an empty element

      while I would expect to have my special char coded on 4 bytes again.

      (see http://qt-project.org/forums/viewthread/16273/#81966 to have a more readable code sample)

      QDomDocument xmlDoc;

      //create a string containing an utf-8 char encoded on 4 bytes (note, this is a valid char coming from a valid XML file encoded in UTF-8)
      QByteArray originalSpecialChar;
      originalSpecialChar.append(0xF0);
      originalSpecialChar.append(0x9D);
      originalSpecialChar.append(0x8C);
      originalSpecialChar.append(0x86);

      //put it in a string (thus converted in UNICODE but it keeps the right character)
      QString originalSpecialCharInString = QString::fromUtf8(originalSpecialChar.constData(), 4);

      //add this string into a new XML doc (encoded in UTF-8)
      xmlDoc.appendChild(xmlDoc.createProcessingInstruction("xml", "version=\"1.0\" encoding=\"UTF-8\""));

      QDomElement rootNode = xmlDoc.createElement("RootNode");
      xmlDoc.appendChild(rootNode);

      QDomText textNode = xmlDoc.createTextNode(originalSpecialCharInString);
      rootNode.appendChild(textNode);

      //at this point, the specialChar is still correct in the QDomDocument (so the conversion from UTF-8 -> Unicode -> UTF-8 actually works !)
      if (textNode.nodeValue().toUtf8() != originalSpecialChar)
      qDebug() << "invalid (1)"; //this does not show

      //save the xml doc into a QByteArray (using save)
      QByteArray xmlContent;
      QTextStream textStream(&xmlContent);
      xmlDoc.save(textStream, 0, QDomNode::EncodingFromDocument); //note: same result if I force the textStream codec to UTF-8 and use EncodingFromTextStream

      qDebug() << xmlContent; //shows <?xml version="1.0" encoding="UTF-8"?><RootNode>#xdf06;</RootNode>
      //the node contains the string "#xdf06". This is really not the character I expect

      //save with toString()
      qDebug() << xmlDoc.toString(0); //shows <?xml version="1.0" encoding="UTF-8"?><RootNode>�</RootNode>
      //Qt is actually able to read this document but, not a C# client because it actually contains an invalid entity. If I use QDomImplementation::DropInvalidChars, the element is empty so, Qt knows it is invalid.

      qDebug() << xmlDoc.toString(0).toUtf8(); //shows <?xml version="1.0" encoding="UTF-8"?><RootNode>#xdf06;</RootNode>
      //it does not help !

      //what I would expect (this is actually what my original xml file looked like):
      //<?xml version="1.0" encoding="UTF-8"?><RootNode>my original char coded on 4 bytes in the utf-8 doc</RootNode>

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Unassigned Unassigned
            alainmazy Alain Mazy
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes