我尝试使用带有Unicode字符的TJSONObject
类,但在解析之后,我得到????
而不是原始文本。
简单的问题是:
UnicodeString doc = L"{\"alias\":\"Test ЮРИСЛАВ\"}";
ShowMessage(doc);
TJSONObject* jo=new TJSONObject();
jo->Parse(BytesOf(doc), 0);
ShowMessage(jo->ToString());
第一个ShowMessage
正确显示文字:ЮРИСЛАВ
但解析后的第二个ShowMessage
显示????
而不是ЮРИСЛАВ
。
我做错了什么?
答案 0 :(得分:2)
You are using CV_CAP_PROP
, which converts a BytesOf()
to a byte array using the OS default Ansi encoding. UnicodeString
prefers UTF-8 instead. It looks for a UTF-8 BOM and if not found then it makes no assumptions about the encoding of the bytes, it just treats them as 8-bit characters. That will not work when dealing with non-ASCII characters, you need to use UTF-8 instead. To convert a TJSONObject::Parse()
to a UTF-8 encoded byte array, you can use UnicodeString
, but you would have to prepend the array with a UTF-8 BOM manually:
TEncoding::UTF8::GetBytes()
That being said, you should be using the static UnicodeString doc = L"{\"alias\":\"Test ЮРИСЛАВ\"}";
ShowMessage(doc);
TBytes bytes;
bytes.Length = 3 + TEncoding::UTF8::GetByteCount(doc);
bytes[0] = 0xEF;
bytes[1] = 0xBB;
bytes[2] = 0xBF;
TEncoding::UTF8::GetBytes(doc, 1, doc.Length(), bytes, 3);
TJSONObject* jo = new TJSONObject();
jo->Parse(bytes, 0);
ShowMessage(jo->ToString());
//...
delete jo;
method instead of TJSONObject::ParseJSONValue()
directly. TJSONValue::Parse()
even has an overload that accepts a ParseJSONValue()
as input and will convert it to a UTF-8 encoded byte array internally for you:
UnicodeString
But, if you did need to pass in your own byte array, the other overloads of UnicodeString doc = L"{\"alias\":\"Test ЮРИСЛАВ\"}";
ShowMessage(doc);
TJSONObject* jo = (TJSONObject*) TJSONObject::ParseJSONValue(doc);
ShowMessage(jo->ToString());
//...
delete jo;
allow you to specify whether the byte array is UTF-8 encoded or not (it assumes UTF-8 by default) so you don't need a BOM:
ParseJSONValue()
答案 1 :(得分:1)
改为这样:
jo = (TJSONObject*) TJSONObject::ParseJSONValue(TEncoding::UTF8::GetBytes(doc), 0);
在解析之前,您应该将Unicode文本转换为UTF8。