Details
-
Bug
-
Resolution: Invalid
-
P3: Somewhat important
-
None
-
4.7.3
-
None
-
Gentoo Linux x86_64
Description
I have invalid unicode bytesequences in QByteArray
for example:
F0 9D 93 98 27 F0 9D 93 B6 20
which looks like 2 utf8 characters, but they aren't.
notice 27 and 20 bytes. they are invalid according to utf-8 spec.
then when I try to QString::fromUtf8(barray.constData(), barray.size()).toUtf8()
I get the same invalid sequences while according to documentation they should be somehow replaced.
However, invalid sequences are possible with UTF-8 and, if any such are found, they will be replaced with one or more "replacement characters", or suppressed. These include non-Unicode sequences, non-characters, overlong sequences or surrogate codepoints encoded into UTF-8.