我正在尝试使用json解码流式消息,但会抛出以下ValueError:
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 23571 column 1 (char 126 - 72358378)
我搜索了SO,可能的原因是我的流媒体消息。如果是这样,如何以pythonic方式将我的流式消息拆分成多个dicts?
我的流媒体消息的一些部分:
{"delete":{"status":{"id":486174602859528192,"id_str":"486174602859528192","user_id":2455171405,"user_id_str":"2455171405"}}}
{"delete":{"status":{"id":244223991382937601,"id_str":"244223991382937601","user_id":236405781,"user_id_str":"236405781"}}}
{"delete":{"status":{"id":243934303371792384,"id_str":"243934303371792384","user_id":236405781,"user_id_str":"236405781"}}}
{"delete":{"status":{"id":320790822129913856,"id_str":"320790822129913856","user_id":320634758,"user_id_str":"320634758"}}}
{"delete":{"status":{"id":399494495630155776,"id_str":"399494495630155776","user_id":1227287820,"user_id_str":"1227287820"}}}
{"delete":{"status":{"id":399528981206007808,"id_str":"399528981206007808","user_id":1227287820,"user_id_str":"1227287820"}}}
{"created_at":"Wed Jul 09 12:16:27 +0000 2014","id":486846341600251904,"id_str":"486846341600251904","text":"#RT \u0430 \u0437\u043d\u0430\u0435\u0442\u0435 \u043f\u043e\u0447\u0435\u043c\u0443 \u044f \u043d\u0435 \u0431\u0443\u0434\u0443 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0442\u044c \u0442\u0440\u0435\u043d\u0434 \u043e \u041d\u0438\u043a\u043e\u043b\u044c?","source":"\u003ca href=\"http:\/\/www.ckhi.com.ua\" rel=\"nofollow\"\u003e\"Original atok\"\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2530930573,"id_str":"2530930573","name":"\u041b\u0435\u043f\u0430\u0448\u0438\u043da \u041f\u0435\u043b\u0430\u0433\u0435\u044f","screen_name":"miki4390","location":"\u0421\u0430\u043d\u043a\u0442-\u041f\u0435\u0442\u0435\u0440\u0431\u0443\u0440\u0433","url":"https:\/\/twitter.com\/miki4390","description":"\u042f-\u0442\u043e \u0442\u0435\u0440\u043f\u043b\u044e. \u041d\u043e \u0442\u044b-\u0442\u043e \u043f\u043e\u0436\u0430\u043b\u0435\u0435\u0448\u044c...","protected":false,"verified":false,"followers_count":0,"friends_count":0,"listed_count":0,"favourites_count":0,"statuses_count":11,"created_at":"Wed May 28 21:41:41 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/abs.twimg.com\/sticky\/default_profile_images\/default_profile_3_normal.png","profile_image_url_https":"https:\/\/abs.twimg.com\/sticky\/default_profile_images\/default_profile_3_normal.png","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/2530930573\/1404903710","default_profile":true,"default_profile_image":true,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"RT","indices":[0,3]}],"trends":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"ru"}
{"delete":{"status":{"id":295365152621080577,"id_str":"295365152621080577","user_id":710752640,"user_id_str":"710752640"}}}
答案 0 :(得分:3)
您的JSON实际上是JSON行的集合。
一次读取所有行会导致JSON数据损坏。
逐行读取行并解码效果很好。
将json行放在文件“jslines.json”中,代码如下:
>>> import json
>>> fname = "jslines.json"
>>> f = open(fname)
>>> for line in f:
... print json.loads(line)
解码并打印所有行。
替代方法是使用这些行来构建有效的JSON结构,在本例中是一个数组。我们必须得到行列表(作为文本),使用“,”连接,并在“[”和“]”“之间括起来。
>>> with open(fname) as f:
... lines = list(f)
现在我们列出了lines
构建生成的JSON文本:
>>> jstext = "[" + ",".join(lines) + "]"
并将其加载到字典中:
>>> json.loads(jstext)
这适用于您提供的数据。