Question

我有一个json文件碰巧有很多中文和日文（和其他语言）字符。我正在使用io.open将其加载到我的python 2.7脚本中，如下所示：

with io.open('multiIdName.json', encoding="utf-8") as json_data:
    cards = json.load(json_data)

我向json添加了一个新属性，一切都很好。然后我尝试将其写回另一个文件：

with io.open("testJson.json",'w',encoding="utf-8") as outfile:
        json.dump(cards, outfile, ensure_ascii=False)

当我收到错误TypeError: must be unicode, not str

时

我尝试将outfile写成二进制文件（with io.open("testJson.json",'wb') as outfile:），但我最终得到了这样的东西：

{"multiverseid": 262906, "name": "\u00e6\u00b8\u00b8\u00e9\u009a\u00bc\u00e7\u008b\u00ae\u00e9\u00b9\u00ab", "language": "Chinese Simplified"}

我认为以相同的编码打开和编写它就足够了，还有ensure_ascii标志，但显然不是。我只是想在运行脚本之前保留文件中存在的字符，而不会将它们变成\ u'。

Answer 1

您可以尝试以下方法吗？

filter(None, (func(i) for i in range(10)))

Answer 2

此错误的原因是Python 2中json.dumps的完全愚蠢行为：

>>> json.dumps({'a': 'a'}, ensure_ascii=False)
'{"a": "a"}'
>>> json.dumps({'a': u'a'}, ensure_ascii=False)
u'{"a": "a"}'
>>> json.dumps({'a': 'ä'}, ensure_ascii=False)
'{"a": "\xc3\xa4"}'
>>> json.dumps({u'a': 'ä'}, ensure_ascii=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 210, in encode
    return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

再加上io.open encoding设置只接受unicode个对象（这本身就是对的），会导致问题。

返回类型完全取决于字典中键或值的类型，如果ensure_ascii=False，则str始终返回ensure_ascii=True。如果你不小心将8位字符串设置为字典，则不能盲目地将此返回类型转换为unicode，因为需要来设置编码，大概是UTF-8：

>>> x = json.dumps(obj, ensure_ascii=False)
>>> if isinstance(x, str):
...     x = unicode(x, 'UTF-8')

在这个的情况下，我相信您可以使用json.dump写入打开的二进制文件;但是如果你需要对结果对象做一些更复杂的事情，你可能需要上面的代码。

一种解决方案是通过切换到Python 3来结束所有这些编码/解码的疯狂。

Answer 3

JSON模块为您处理编码和解码，因此您只需以二进制模式打开输入和输出文件即可。 JSON模块采用UTF-8编码，但可以使用encoding和load()方法上的dump()属性进行更改。

with open('multiIdName.json', 'rb') as json_data:
    cards = json.load(json_data)

然后： <击>

<击>

with open("testJson.json", 'wb') as outfile:
    json.dump(cards, outfile, ensure_ascii=False)

<击> 感谢@Antti Haapala，Python 2.x JSON模块根据对象的内容提供Unicode或str。

在编写io之前，您必须添加检测检查以确保结果为Unicode：

with io.open("testJson.json", 'w', encoding="utf-8") as outfile:
    my_json_str = json.dumps(my_obj, ensure_ascii=False)
    if isinstance(my_json_str, str):
        my_json_str = my_json_str.decode("utf-8")

    outfile.write(my_json_str)

Answer 4

您可以尝试以下方法吗？

# -*- coding:utf-8 -*-
import codecs
with codecs.open("test.json","w") as file:
    json.dump(my_list, file, indent=4, ensure_ascii=False)

Json.dump失败了'必须是unicode，而不是str'TypeError

4 个答案: