Question

我有以下用例：

从数据中我生成一个带有数据的json，其中一部分是希伯来语。例如：

import json
j = {}
city =u'חיפה' #native unicode
j['results']= []
j['results'].append({'city':city}) #Also tried to city.encode('utf-8') and other encodings

为了生成一个json文件，该文件兼作我的app db（微geoapp）和文件，我的用户可以直接编辑和修复数据，我使用json lib和：

to_save = json.dumps(j)
with open('test.json','wb') as f: #also tried with w instead of wb flag.
   f.write(to_save)
   f.close()

问题是我得到了一个unicode解码的json，其中u'חיפה'代表如下：的u '\ u05d7 \ u05d9 \ u05e4 \ u05d4'

大多数脚本和应用程序在读取Unicodestring时没有任何问题，但我的USERS有一个！，并且由于他们需要直接编辑JSON，他们无法找出希伯来语文本。

所以，问题：在另一个编辑器中打开json时，我该如何编写json会显示希伯来字符？

我不确定这是否可以解决，因为我怀疑JSON是unicode，我不能在其中使用asccii，但不确定。

感谢您的帮助

Answer 1

使用ensure_ascii=False参数。

>>> import json
>>> city = u'חיפה'
>>> print(json.dumps(city))
"\u05d7\u05d9\u05e4\u05d4"
>>> print(json.dumps(city, ensure_ascii=False))
"חיפה"

根据json.dump documentation：

如果ensure_ascii为True（默认值），则为所有非ASCII字符输出使用\ uXXXX序列进行转义，结果为str 实例仅由ASCII字符组成。如果ensure_ascii是错误，写入fp的一些块可能是unicode实例。这个通常是因为输入包含unicode字符串或使用编码参数。除非fp.write（）明确理解 unicode（如在codecs.getwriter（）中）这可能会导致错误。

您的代码应如下所示：

import json
j = {'results': [u'חיפה']}
to_save = json.dumps(j, ensure_ascii=False)
with open('test.json', 'wb') as f:
    f.write(to_save.encode('utf-8'))

或

import codecs
import json
j = {'results': [u'חיפה']}
to_save = json.dumps(j, ensure_ascii=False)
with codecs.open('test.json', 'wb', encoding='utf-8') as f:
    f.write(to_save)

或

import codecs
import json
j = {'results': [u'חיפה']}
with codecs.open('test.json', 'wb', encoding='utf-8') as f:
    json.dump(j, f, ensure_ascii=False)

序列化为JSON，保留希伯来字符

1 个答案: