Question

我来自这个old discussion，但解决方案并没有多大帮助，因为我的原始数据编码方式不同：

我的原始数据已经以unicode编码，我需要输出为UTF-8

data={"content":u"\u4f60\u597d"}

当我尝试转换为utf时：

json.dumps(data, indent=1, ensure_ascii=False).encode("utf8")

我得到的输出是 "content": "ä½ å¥½"和预期的输出应该是 "content": "你好"

我在没有ensure_ascii=false的情况下尝试过，输出变为纯粹的未转义"content": "\u4f60\u597d"

如何将以前\ u转义的json转换为UTF-8？

Answer 1

您拥有 UTF-8 JSON数据：

>>> import json
>>> data = {'content': u'\u4f60\u597d'}
>>> json.dumps(data, indent=1, ensure_ascii=False)
u'{\n "content": "\u4f60\u597d"\n}'
>>> json.dumps(data, indent=1, ensure_ascii=False).encode('utf8')
'{\n "content": "\xe4\xbd\xa0\xe5\xa5\xbd"\n}'
>>> print json.dumps(data, indent=1, ensure_ascii=False).encode('utf8')
{
 "content": "你好"
}

我的终端发生配置为处理UTF-8，因此将UTF-8字节打印到我的终端产生了所需的输出。

但是，如果您的终端未设置为此类输出，则终端会显示“错误”字符：

>>> print json.dumps(data, indent=1,  ensure_ascii=False).encode('utf8').decode('latin1')
{
 "content": "ä½ å¥½"
}

注意我将数据解码到Latin-1以故意错误读取UTF-8字节。

这不是Python问题;这是您在用于读取这些字节的任何工具中如何处理UTF-8字节的问题。

Answer 2

在python2中，它有效;但是在python3中print将输出如下：

>>> b'{\n "content": "\xe4\xbd\xa0\xe5\xa5\xbd"\n}'

不要使用encode('utf8')：

>>> print(json.dumps(data, indent=1, ensure_ascii=False))
{
 "content": "你好"
}

或使用sys.stdout.buffer.write代替print：

>>> import sys
>>> import json
>>> data = {'content': u'\u4f60\u597d'}
>>> sys.stdout.buffer.write(json.dumps(data, indent=1, 
ensure_ascii=False).encode('utf8') + b'\n')
{
 "content": "你好"
}

请参阅Write UTF-8 to stdout, regardless of the console's encoding

json.dumps \ u将unicode转义为utf8

2 个答案: