Python GAE:仅在生产服务器上使用json.loads引发了ValueError(Unpaired high surrogate)

时间:2013-03-06 00:21:48

标签: json google-app-engine python-2.7 urlfetch

在使用Python的GAE上,我使用urlfetch从Flickr获取json字符串。当我尝试在生产服务器上使用json.loads加载该字符串时,抛出异常“引发的ValueError(Unpaired high surrogate)”。

当我尝试在开发控制台中json.loads字符串时,它会按预期加载到dict中(见下文)。我已经使用相同的代码从Flickr成功加载了其他几个json字符串。下面的json字符串有一些东西只会在生产服务器上抛出ValueError异常。

import json

s = """{"photo":{"id":"191019103", "secret":"d7a8bb95bc", "server":"72", "farm":1, "dateuploaded":"1153079847", "isfavorite":0, "license":"1", "safety_level":"0", "rotation":0, "originalsecret":"d7a8bb95bc", "originalformat":"jpg", "owner":{"nsid":"13968020@N00", "username":"\ud800dc80 jgraham", "realname":"", "location":"", "iconserver":"38", "iconfarm":1}, "title":{"_content":"By the Year 2000 All Our Food Will be in the Form of Tiny Pills"}, "description":{"_content":""}, "visibility":{"ispublic":1, "isfriend":0, "isfamily":0}, "dates":{"posted":"1153079847", "taken":"2006-07-15 14:31:16", "takengranularity":"0", "lastupdate":"1282690106"}, "views":"984", "editability":{"cancomment":0, "canaddmeta":0}, "publiceditability":{"cancomment":1, "canaddmeta":0}, "usage":{"candownload":1, "canblog":0, "canprint":0, "canshare":1}, "comments":{"_content":"18"}, "notes":{"note":[]}, "people":{"haspeople":0}, "tags":{"tag":[{"id":"1207251-191019103-2909", "author":"13968020@N00", "raw":"Birmingham", "_content":"birmingham", "machine_tag":0}, {"id":"1207251-191019103-77552", "author":"13968020@N00", "raw":"Bullring", "_content":"bullring", "machine_tag":0}, {"id":"1207251-191019103-463", "author":"13968020@N00", "raw":"Abstract", "_content":"abstract", "machine_tag":0}, {"id":"1207251-191019103-1174", "author":"13968020@N00", "raw":"Architecture", "_content":"architecture", "machine_tag":0}, {"id":"1207251-191019103-141", "author":"13968020@N00", "raw":"Blue", "_content":"blue", "machine_tag":0}, {"id":"1207251-191019103-2194948", "author":"13968020@N00", "raw":"i500", "_content":"i500", "machine_tag":0}, {"id":"1207251-191019103-11820", "author":"13968020@N00", "raw":"Explore", "_content":"explore", "machine_tag":0}, {"id":"1207251-191019103-3254511", "author":"13968020@N00", "raw":"utata_feature", "_content":"utatafeature", "machine_tag":0}]}, "urls":{"url":[{"type":"photopage", "_content":"http:\/\/www.flickr.com\/photos\/jgraham\/191019103\/"}]}, "media":"photo"}, "stat":"ok"}"""

print json.loads(s) #prints dict

4 个答案:

答案 0 :(得分:0)

sudo pip install simplejson == 3.6.5

import simplejson
simplejson.loads('{"":"\\ud800"}')

我有低代理人的同样问题(\ udfb6)

答案 1 :(得分:0)

此问题已针对Python 2.7.7及更高版本修复。

http://bugs.python.org/issue11489

https://hg.python.org/cpython/raw-file/v2.7.7/Misc/NEWS

但是,截至2016年3月11日,Google App Engine正在生产中运行Python 2.7.5,而json模块没有补丁11489。

我与GAE支持团队就此问题进行了交谈,他们已经提出了Google公共问题跟踪器:

https://code.google.com/p/googleappengine/issues/detail?id=12823

与此同时,使用simplejson模块而不是mihaicc建议的标准json模块,看起来是最好的解决方案。我测试了simplejson版本3.8.2并且它都在GAE上运行并且没有产生错误。

答案 2 :(得分:0)

GAE Standard的Python 2运行时migrated from 2.7.5 to 2.7.12 in June 2017,因此这不再是问题。您可以在https://shell-hrd.appspot.com/上进行测试:

Google App Engine/1.9.86
Python 2.7.12 (default, Jun 12 2019, 11:33:04) 
[GCC 4.2.1 Compatible Clang google3-trunk (trunk r361749)]

>>> import json
>>> json.loads('{"":"\\ud800"}')
{u'': u'\ud800'}

答案 3 :(得分:0)

如果您有一个很大的json文件,每行只有一条记录,并且不介意丢失某些行,则可以忽略这些行。

grep -v "\\\\ud" file.json > file2.json