如何将此JSON文件存储在Pandas数据框中?

时间:2019-04-18 09:06:59

标签: python json pandas

我以前从未使用过JSON文件。我有此新闻分类数据集。我想在Pandas数据框中得到它。 看起来像这样:

{"content": "Unions representing workers at Turner   Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.","annotation":{"notes":"","label":["Business"]},"extras":null,"metadata":{"first_done_at":1521027375000,"last_updated_at":1521027375000,"sec_taken":0,"last_updated_by":"nlYZXxNBQefF2u9VX52CdONFp0C3","status":"done","evaluation":"NONE"}}
{"content": "SPACE.com - TORONTO, Canada -- A second\\team of rocketeers competing for the  #36;10 million Ansari X Prize, a contest for\\privately funded suborbital space flight, has officially announced the first\\launch date for its manned rocket.","annotation":{"notes":"","label":["SciTech"]},"extras":null,"metadata":{"first_done_at":1521027375000,"last_updated_at":1521027375000,"sec_taken":0,"last_updated_by":"nlYZXxNBQefF2u9VX52CdONFp0C3","status":"done","evaluation":"NONE"}}

还有更多条目,但我只发布了其中两个。每个条目都用{}括起来。每个条目都有4个键:“内容”,“注释”,“附加”,“元数据”。我想在数据帧中使用上面的键作为列。

我尝试了json库和Pandas.read_json函数,但都给了我错误。

with open('News-Classification-DataSet.json') as data_file:
  df=json.load(data_file)

这给出了一个错误:JSONDecodeError: Extra data: line 2 column 1 (char 378)

2 个答案:

答案 0 :(得分:2)

我相信您必须按照每一行的顺序读取此文件,因为它不是有效的json格式。

因此,请阅读以下内容:

import json

data = []
with open('News-Classification-DataSet.json') as f:
    for line in f:
        data.append(json.loads(line))

现在您应该可以使用它,但是,作为datframe输出您想要什么呢?

如果要直接转到数据框,可以按照建议进行操作:

df = pd.read_json("News-Classification-DataSet.json", lines=True)

但是您有嵌套的列,我不知道您想如何处理。

答案 1 :(得分:2)

要将行分隔的json加载到数据框中,

import pandas as pd

df = pd.read_json("News-Classification-DataSet.json", lines=True)

要将列中的dict解析为列,

pd.concat(
    [
        df["annotation"].apply(pd.Series),
        df[["content", "extras"]],
        df["metadata"].apply(pd.Series),
    ],
    axis=1,
)