在SO中有很多关于Unicode解码的问题,并且这个集合都不符合我的需要。
说,我需要处理一些json数据,其部分如下所示:
u'message': {
u'body': u'\u0441\u043f\u0430\u0441\u0438\u0431\u043e \u0431\u043e\u043b\u044c\u0448\u043e\u0435',
u'user_id': 374298,
u'title': u' ... ',
u'date': 1461496370,
u'read_state': 0,
u'id': 4325364,
u'out': 1
}
唯一的目标是摆脱unicode以获取人体可读的文字" body"行:
'message': {
'body': 'cпасибо большое',
'user_id': 374298,
'title': ' ... ',
'date': 1461496370,
'read_state': 0,
'id': 4325364,
'out': 1
}
只要有大量的json数据,我就修改了一些发现的python源代码,如下所示:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
filePath = "/home/user/foo.txt"
outPath = filePath + ".new"
outText = ""
inF = open(filePath, "rb")
fContent = unicode(inF.read(), 'utf-8')
inF.close()
for everysymbol in fContent:
if isinstance(everysymbol, unicode):
outText += everysymbol.decode('cp1251')
else:
pass
fContent = outText
outF = open(outPath, "wb")
outF.write(outText.decode("utf-8"))
outF.close()
不幸的是,此代码不会更改目标文件中的任何内容。我如何真正解码我的数据是人类可读的?
答案 0 :(得分:4)
您的数据不是json,而是已经解析过的json数据。 Python 2使用repr()
打印字典和列表,它将非ASCII字符显示为转义码。直接打印字符串以查看其内容,该内容使用str()
:
import json
# Your already parsed data
data = {u'message': {
u'body': u'\u0441\u043f\u0430\u0441\u0438\u0431\u043e \u0431\u043e\u043b\u044c\u0448\u043e\u0435',
u'user_id': 374298,
u'title': u' ... ',
u'date': 1461496370,
u'read_state': 0,
u'id': 4325364,
u'out': 1}}
# Write it to a file to generate real json-encoded data:
with open('foo.json','w') as f:
json.dump(data,f,indent=2)
# Display actual json-encoded file:
with open('foo.json') as f:
print f.read()
print
# reparse the data
with open('foo.json') as f:
data = json.load(f)
# Printing dictionaries and lists escapes non-ASCII
print data
print
# But print the string directly and it is fine (assuming your terminal and font supports the characters):
print data[u'message'][u'body']
输出:
{
"message": {
"body": "\u0441\u043f\u0430\u0441\u0438\u0431\u043e \u0431\u043e\u043b\u044c\u0448\u043e\u0435",
"user_id": 374298,
"title": " ... ",
"date": 1461496370,
"read_state": 0,
"id": 4325364,
"out": 1
}
}
{u'message': {u'body': u'\u0441\u043f\u0430\u0441\u0438\u0431\u043e \u0431\u043e\u043b\u044c\u0448\u043e\u0435', u'user_id': 374298, u'title': u' ... ', u'date': 1461496370, u'read_state': 0, u'id': 4325364, u'out': 1}}
спасибо большое
请注意,Python 3不再为repr()
转义可打印的非ASCII。
答案 1 :(得分:0)
#!/usr/bin/env python3
import json
filePath = "/home/user/foo.txt"
outPath = filePath + ".new"
with open(filePath, "rb") as f:
fContent = json.loads(f.read())
with open(outPath, "wb") as f:
f.write(json.dumps(fContent, ensure_ascii=False).encode())