Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-20381

QString::fromUtf8 does not clean up/replace invalid sequences

    XMLWordPrintable

Details

    • Bug
    • Resolution: Invalid
    • P3: Somewhat important
    • None
    • 4.7.3
    • None
    • Gentoo Linux x86_64

    Description

      I have invalid unicode bytesequences in QByteArray

      for example:

      F0 9D 93 98 27 F0 9D 93 B6 20

      which looks like 2 utf8 characters, but they aren't.

      notice 27 and 20 bytes. they are invalid according to utf-8 spec.
      then when I try to QString::fromUtf8(barray.constData(), barray.size()).toUtf8()
      I get the same invalid sequences while according to documentation they should be somehow replaced.

      However, invalid sequences are possible with UTF-8 and, if any such are found, they will be replaced with one or more "replacement characters", or suppressed. These include non-Unicode sequences, non-characters, overlong sequences or surrogate codepoints encoded into UTF-8.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            thiago Thiago Macieira
            rion Rion
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes