Question

我正在尝试阅读一系列JSON文件并转换为Pandas DataFrame，但是，我所遵循的示例都没有用于阅读部分。

这是我拥有的JSON文件的示例：

{
    "created_at": "Thu Nov 02 01:09:12 +0000 2017",
    "text": "RT @coindesk: SEC: Celebrity ICO Endorsements Could Be Illegal gHoWduXOBp t.co/iyWla0Ryuk",
    "tweet_id": 925892516087558145,
    "user_id": 153962533,
    "user_name": "Christine Duhaime"
}{
    "created_at": "Thu Nov 02 01:09:44 +0000 2017",
    "text": "Cornell Professor C t.co/RuNu6UQyr9",
    "tweet_id": 925892650884108289,
    "user_id": 1255045351,
    "user_name": "Local SEO Somerset"
}

我试过了：

with codecs.open('./output/streamer_20171022-2010.json', 'r+', encoding='utf-8') as data_file:
    data = json.load(data_file)

结果

JSONDecodeError: Extra data: line 1 column 416 (char 415)

我也尝试过逐行阅读......没有成功。

有什么想法吗？

Answer 1

您的JSON文件格式无效。您只能在有效的JSON中有一个顶级元素

尝试将顶级对象放入数组中。

[
    { "created_at": "Thu Nov 02 01:09:12 +0000 2017", 
      "text": "RT @coindesk: SEC: Celebrity ICO Endorsements Could Be Illegal gHoWduXOBp t.co/iyWla0Ryuk",
      "tweet_id": 925892516087558145,
      "user_id": 153962533, 
      "user_name": "Christine Duhaime" 
    }, { 
      "created_at": "Thu Nov 02 01:09:44 +0000 2017",
      "text": "Cornell Professor C t.co/RuNu6UQyr9", 
      "tweet_id": 925892650884108289,
      "user_id": 1255045351,
      "user_name": "Local SEO Somerset" 
    }
]

将JSON文件读入pandas数据帧

1 个答案: