我想将调用的结果从API转换为数据框。 API调用的结果是一个嵌套的字典,但是生成的数据帧不是我需要的。
除了json_normalize之外,我还尝试了pd.DataFrame.from_dict。但是,直到现在一直没有成功。我也尝试过整理字典,但是什么也没有。
我使用了以下电话:
[73] results = requests.get(url).json()
results
输出为:
{'result': {'totalrows': 3124,
'rows': [{'rownum': 1,
'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683},
{'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'},
{'field': 'issuetitle', 'value': 'COM'},
{'field': 'filerid', 'value': 1089387}]},
{'rownum': 2,
'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683},
{'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'},
{'field': 'issuetitle', 'value': 'COM'},
{'field': 'filerid', 'value': 1086893}]},
{'rownum': 3,
'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683},
{'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'},
{'field': 'issuetitle', 'value': 'COM'},
{'field': 'filerid', 'value': 1085803}]}
然后使用以下代码生成数据框:
[74] Owners = results['result']['rows']
df1 = json_normalize(Owners)
df1.head()
这是输出:
rownum values
0 1 [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683}, {'field':
'ticker', 'value': 'AAPL'}, {'field': 'companyname',
'value': 'APPLE INC'}, {'field': 'issuetitle', 'value':
'COM'}, {'field': 'filerid', 'value': 1089387}
1 2 [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683}, {'field':
'ticker', 'value': 'AAPL'}, {'field': 'companyname',
'value': 'APPLE INC'}, {'field': 'issuetitle', 'value':
'COM'}, {'field': 'filerid', 'value': 1086893}
2 3 [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, {'field':
'issueid', 'value': 472683}, {'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'}, {'field':
'issuetitle', 'value': 'COM'}, {'field': 'filerid', 'value': 1085803}
但是,我要获取具有以下格式的DataFrame:
答案 0 :(得分:1)
您可以使用pandas.DataFrame.from_dict
,但需要删除数据中所有不必要的数据。实际上,您只想保留field
值和value
每行。您可以通过列表理解来做到这一点:
data = [{ field["field"]:field["value"] for field in row['values']
} for row in data['result']["rows"]]
print(data)
# [{'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1089387},
# {
# 'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1086893},
# {
# 'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1085803
# }]
一旦有了这本字典,就可以调用from_dict
方法:
df = pd.DataFrame.from_dict(data)
print(df)
# companyname filerid issueid issuetitle querydate ticker
# 0 APPLE INC 1089387 472683 COM 7/31/2019 3:19 PM AAPL
# 1 APPLE INC 1086893 472683 COM 7/31/2019 3:19 PM AAPL
# 2 APPLE INC 1085803 472683 COM 7/31/2019 3:19 PM AAPL
如果要获取rownum
作为列(或索引):
data = [{**{field["field"]:field["value"] for field in row['values']}, **{'rownum': row["rownum"]}} for row in data['result']["rows"]]
df = pd.DataFrame.from_dict(data)
print(df)
# companyname filerid issueid issuetitle querydate rownum ticker
# 0 APPLE INC 1089387 472683 COM 7/31/2019 3:19 PM 1 AAPL
# 1 APPLE INC 1086893 472683 COM 7/31/2019 3:19 PM 2 AAPL
# 2 APPLE INC 1085803 472683 COM 7/31/2019 3:19 PM 3 AAPL
答案 1 :(得分:0)
天真嵌套用于循环尝试...
import pandas as pd
df = pd.DataFrame([])
for row in json["result"]["rows"]:
rownum = row["rownum"]
querydate = issueid = ticker = companyname = issuetitle = filerid = None
for value_dict in row["values"]:
if value_dict["field"] == "querydate":
querydate = value_dict["value"]
elif value_dict["field"] == "issueid":
issueid = value_dict["value"]
elif value_dict["field"] == "ticker":
ticker = value_dict["value"]
elif value_dict["field"] == "companyname":
companyname = value_dict["value"]
elif value_dict["field"] == "filerid":
filerid = value_dict["value"]
df = df.append(pd.DataFrame({"rownum": rownum,
"querydate": querydate,
"issueid": issueid,
"ticker": ticker,
"companyname": companyname,
"issuetitle": issuetitle,
"filerid": filerid,
}, index=[0]), ignore_index=True)
print(df)
JSON对象:
json = {
"result": {
"totalrows": 3,
"rows": [
{
"rownum": 1,
"values": [
{
"field": "querydate",
"value": "7/31/2019 3:19 PM"
},
{
"field": "issueid",
"value": 472683
},
{
"field": "ticker",
"value": "AAPL"
},
{
"field": "companyname",
"value": "APPLE INC"
},
{
"field": "issuetitle",
"value": "COM"
},
{
"field": "filerid",
"value": 1089387
}
]
},
{
"rownum": 2,
"values": [
{
"field": "querydate",
"value": "7/31/2019 3:19 PM"
},
{
"field": "issueid",
"value": 472683
},
{
"field": "ticker",
"value": "AAPL"
},
{
"field": "companyname",
"value": "APPLE INC"
},
{
"field": "issuetitle",
"value": "COM"
},
{
"field": "filerid",
"value": 1086893
}
]
},
{
"rownum": 3,
"values": [
{
"field": "querydate",
"value": "7/31/2019 3:19 PM"
},
{
"field": "issueid",
"value": 472683
},
{
"field": "ticker",
"value": "AAPL"
},
{
"field": "companyname",
"value": "APPLE INC"
},
{
"field": "issuetitle",
"value": "COM"
},
{
"field": "filerid",
"value": 1085803
}
]
}
]
}
}
输出:
rownum querydate issueid ticker companyname issuetitle filerid
0 1 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1089387
1 2 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1086893
2 3 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1085803