如何将嵌套字典转换为Pandas DataFrame?

时间:2019-07-31 21:30:25

标签: python json pandas dataframe dictionary

我想将调用的结果从API转换为数据框。 API调用的结果是一个嵌套的字典,但是生成的数据帧不是我需要的。

除了json_normalize之外,我还尝试了pd.DataFrame.from_dict。但是,直到现在一直没有成功。我也尝试过整理字典,但是什么也没有。

我使用了以下电话:

[73] results = requests.get(url).json()
results

输出为:

{'result': {'totalrows': 3124,
  'rows': [{'rownum': 1,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1089387}]},
   {'rownum': 2,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1086893}]},
   {'rownum': 3,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1085803}]}

然后使用以下代码生成数据框:


[74] Owners = results['result']['rows']
df1 = json_normalize(Owners)
df1.head()

这是输出:

  rownum    values
0   1      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, 
           {'field': 'issueid', 'value': 472683}, {'field': 
           'ticker', 'value': 'AAPL'}, {'field': 'companyname', 
           'value': 'APPLE INC'}, {'field': 'issuetitle', 'value': 
           'COM'}, {'field': 'filerid', 'value': 1089387} 

1   2      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, 
           {'field': 'issueid', 'value': 472683}, {'field': 
           'ticker', 'value': 'AAPL'}, {'field': 'companyname', 
           'value': 'APPLE INC'}, {'field': 'issuetitle', 'value': 
           'COM'}, {'field': 'filerid', 'value': 1086893}

2   3      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, {'field': 
           'issueid', 'value': 472683}, {'field': 'ticker', 'value': 'AAPL'}, 
           {'field': 'companyname', 'value': 'APPLE INC'}, {'field': 
           'issuetitle', 'value': 'COM'}, {'field': 'filerid', 'value': 1085803}

但是,我要获取具有以下格式的DataFrame:

Desired DataFrame format

2 个答案:

答案 0 :(得分:1)

您可以使用pandas.DataFrame.from_dict,但需要删除数据中所有不必要的数据。实际上,您只想保留field值和value每行。您可以通过列表理解来做到这一点:

data = [{ field["field"]:field["value"] for field in row['values']
                    } for row in data['result']["rows"]]
print(data)
# [{'querydate': '7/31/2019 3:19 PM', 
#     'issueid': 472683, 
#     'ticker': 'AAPL', 
#     'companyname': 'APPLE INC',
#     'issuetitle': 'COM',
#     'filerid': 1089387},
# {
#     'querydate': '7/31/2019 3:19 PM',
#     'issueid': 472683,
#     'ticker': 'AAPL',
#     'companyname': 'APPLE INC',
#     'issuetitle': 'COM',
#     'filerid': 1086893},
# {
#     'querydate': '7/31/2019 3:19 PM', 
#     'issueid': 472683, 
#     'ticker': 'AAPL', 
#     'companyname': 'APPLE INC', 
#     'issuetitle': 'COM', 
#     'filerid': 1085803
# }]

一旦有了这本字典,就可以调用from_dict方法:

df = pd.DataFrame.from_dict(data)
print(df)
#   companyname  filerid  issueid issuetitle          querydate ticker
# 0   APPLE INC  1089387   472683        COM  7/31/2019 3:19 PM   AAPL
# 1   APPLE INC  1086893   472683        COM  7/31/2019 3:19 PM   AAPL
# 2   APPLE INC  1085803   472683        COM  7/31/2019 3:19 PM   AAPL

如果要获取rownum作为列(或索引):

data = [{**{field["field"]:field["value"] for field in row['values']}, **{'rownum': row["rownum"]}} for row in data['result']["rows"]]

df = pd.DataFrame.from_dict(data)
print(df)
#   companyname  filerid  issueid issuetitle          querydate  rownum ticker
# 0   APPLE INC  1089387   472683        COM  7/31/2019 3:19 PM       1   AAPL
# 1   APPLE INC  1086893   472683        COM  7/31/2019 3:19 PM       2   AAPL
# 2   APPLE INC  1085803   472683        COM  7/31/2019 3:19 PM       3   AAPL

答案 1 :(得分:0)

天真嵌套用于循环尝试...

import pandas as pd

df = pd.DataFrame([])

for row in json["result"]["rows"]:
    rownum = row["rownum"]
    querydate = issueid = ticker = companyname = issuetitle = filerid = None
    for value_dict in row["values"]:
        if value_dict["field"] == "querydate":
            querydate = value_dict["value"]
        elif value_dict["field"] == "issueid":
            issueid = value_dict["value"]
        elif value_dict["field"] == "ticker":
            ticker = value_dict["value"]
        elif value_dict["field"] == "companyname":
            companyname = value_dict["value"]
        elif value_dict["field"] == "filerid":
            filerid = value_dict["value"]
    df = df.append(pd.DataFrame({"rownum": rownum,
                                 "querydate": querydate,
                                 "issueid": issueid,
                                 "ticker": ticker,
                                 "companyname": companyname,
                                 "issuetitle": issuetitle,
                                 "filerid": filerid,
                                }, index=[0]), ignore_index=True)

print(df)

JSON对象:

json = {
    "result": {
        "totalrows": 3,
        "rows": [
            {
                "rownum": 1,
                "values": [
                    {
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    },
                    {
                        "field": "issueid",
                        "value": 472683
                    },
                    {
                        "field": "ticker",
                        "value": "AAPL"
                    },
                    {
                        "field": "companyname",
                        "value": "APPLE INC"
                    },
                    {
                        "field": "issuetitle",
                        "value": "COM"
                    },
                    {
                        "field": "filerid",
                        "value": 1089387
                    }
                ]
            },
            {
                "rownum": 2,
                "values": [
                    {
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    },
                    {
                        "field": "issueid",
                        "value": 472683
                    },
                    {
                        "field": "ticker",
                        "value": "AAPL"
                    },
                    {
                        "field": "companyname",
                        "value": "APPLE INC"
                    },
                    {
                        "field": "issuetitle",
                        "value": "COM"
                    },
                    {
                        "field": "filerid",
                        "value": 1086893
                    }
                ]
            },
            {
                "rownum": 3,
                "values": [
                    {
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    },
                    {
                        "field": "issueid",
                        "value": 472683
                    },
                    {
                        "field": "ticker",
                        "value": "AAPL"
                    },
                    {
                        "field": "companyname",
                        "value": "APPLE INC"
                    },
                    {
                        "field": "issuetitle",
                        "value": "COM"
                    },
                    {
                        "field": "filerid",
                        "value": 1085803
                    }
                ]
            }
        ]
    }
}

输出:

   rownum          querydate  issueid ticker companyname issuetitle  filerid
0       1  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1089387
1       2  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1086893
2       3  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1085803