Question

我尝试使用带有Unicode字符的TJSONObject类，但在解析之后，我得到????而不是原始文本。

简单的问题是：

UnicodeString doc = L"{\"alias\":\"Test ЮРИСЛАВ\"}";
ShowMessage(doc);
TJSONObject* jo=new TJSONObject();
jo->Parse(BytesOf(doc), 0);
ShowMessage(jo->ToString());

第一个ShowMessage正确显示文字：ЮРИСЛАВ
但解析后的第二个ShowMessage显示????而不是ЮРИСЛАВ。

我做错了什么？

Answer 1

You are using CV_CAP_PROP, which converts a BytesOf() to a byte array using the OS default Ansi encoding. UnicodeString prefers UTF-8 instead. It looks for a UTF-8 BOM and if not found then it makes no assumptions about the encoding of the bytes, it just treats them as 8-bit characters. That will not work when dealing with non-ASCII characters, you need to use UTF-8 instead. To convert a TJSONObject::Parse() to a UTF-8 encoded byte array, you can use UnicodeString, but you would have to prepend the array with a UTF-8 BOM manually:

TEncoding::UTF8::GetBytes()

That being said, you should be using the static UnicodeString doc = L"{\"alias\":\"Test ЮРИСЛАВ\"}"; ShowMessage(doc); TBytes bytes; bytes.Length = 3 + TEncoding::UTF8::GetByteCount(doc); bytes[0] = 0xEF; bytes[1] = 0xBB; bytes[2] = 0xBF; TEncoding::UTF8::GetBytes(doc, 1, doc.Length(), bytes, 3); TJSONObject* jo = new TJSONObject(); jo->Parse(bytes, 0); ShowMessage(jo->ToString()); //... delete jo; method instead of TJSONObject::ParseJSONValue() directly. TJSONValue::Parse() even has an overload that accepts a ParseJSONValue() as input and will convert it to a UTF-8 encoded byte array internally for you:

UnicodeString

But, if you did need to pass in your own byte array, the other overloads of UnicodeString doc = L"{\"alias\":\"Test ЮРИСЛАВ\"}"; ShowMessage(doc); TJSONObject* jo = (TJSONObject*) TJSONObject::ParseJSONValue(doc); ShowMessage(jo->ToString()); //... delete jo; allow you to specify whether the byte array is UTF-8 encoded or not (it assumes UTF-8 by default) so you don't need a BOM:

ParseJSONValue()

Answer 2

改为这样：

jo = (TJSONObject*) TJSONObject::ParseJSONValue(TEncoding::UTF8::GetBytes(doc), 0);

在解析之前，您应该将Unicode文本转换为UTF8。

使用TJSONObject解析Unicode文本将返回'????'

2 个答案: