Question

我试图使用Python阅读推文流。

我文件中的行似乎是正确的，如下所示：

{"delete":{"status":{"id":471622360253345792,"user_id":2513833684,"id_str":"471622360253345792","user_id_str":"2513833684"}}}

当我使用readline将此行读入内存并调用json.loads（）时，我收到以下错误：

No JSON object could be decoded

我想在调用json.loads（）之前必须以某种方式转换行？

一些注意事项：

如果我将文件中的字符串粘贴到IPython中并在其上调用json.loads（），那么一切正常。
当我在IPython中打印该行时，它会在前面添加一个奇怪的字符，并在其余字符之间放置空格。前几个字符看起来像：

{＆＃34; d e e t e＆＃34; ：{＆＃34; s t a t u s
如果我在不调用print的情况下在IPython中显示字符串，则前几个字符为：

\ XFF \ XFE {\ X00＆＃34; \ x00d \ x00e \ x00l \ x00e \ x00t \ x00e \ X00＆＃34; \ X00：\ X00 {\ X00＆＃34; \ x00s \ x00t \ X00A \ x00t \ x00u \ x00s \ X00＆＃34; \ X00

我不知道如何解决这个问题。

编辑：根据请求，读取推文流的代码在这里：

https://github.com/uwescience/datasci_course_materials/blob/master/assignment1/twitterstream.py

Answer 1

从外观上看，你有一些非ascii文本，可能你的解析器没有处理不同的编码。

如果您查看json库上的文档，您会看到：

If the contents of fp are encoded with an ASCII based encoding other than UTF-8 
(e.g. latin-1), then an appropriate encoding name must be specified. Encodings 
that are not ASCII based (such as UCS-2) are not allowed, and should be wrapped 
with codecs.getreader(encoding)(fp), or simply decoded to a unicode object and 
passed to loads().

所以我会检查你的json是否格式正确，然后查看编码。

Answer 2

json.loads(twitter_data, encoding='utf-16')

Answer 3

您是否使用Windows进行分配？在Windows下检索的文本文件的默认编码为UCS-2 LE BOM，json.loads()无法识别。您可以使用Linux操作系统或使用Notepad ++等第三方软件，您可以方便地保存到UTF-8编码。

用python读json字符串的问题

3 个答案: