我有一个嵌套的json,结构如下。对于每个json,我要迭代地从各个级别提取数据-第一级别为“ StreamId”,然后在“ Summary_breakdown”中提取“ TaskID”(即TaskID_1,TaskID_2),并在每个“ TaskID”中提取除“ tools_items”之外的所有项目因为它太长了,可能会导致数据框内部出现问题。
我想将其写为字典,并最终在数据框中进行分析。
{
"success": true,
"resource": {
"StreamId": "xyz",
"Summary_Measures": {
"Summary_Report": {
"Total_Cost": 7000,
"Total_hours": 6087,
"Summary_breakdown": {
"TaskID_1": {
"Task_details": "abc",
"Task_cost": 300,
"Task_hours": 87,
"tools_items": "an_extremely_long_string"
},
"TaskID_2": {
"Task_details": "defgyh",
"Task_cost": 400,
"Task_hours": 6000,
"tools_items": "another_extremely_long_string"
}
}
}
},
}
}
我设法生成了一个URL列表并将json响应存储在一个列表中,但是我无法在脚本第二部分的每个“ Task_ID”中提取“ Task_ID”层和参数“回复项目”。我试图绕过“ Task_ID”层,但是代码仍未运行。任何解决方案和建议,不胜感激!
import json
import pandas as pd
from urllib.request import urlopen
stream_id = ['sdfhef', 'VVqdhi']
myurl_link = []
for id in stream_id:
endpoint = "https://~/%s/~" % id
myurllink.append(endpoint)
myjslist = []
for link in myurl_link:
g = urlopen(link).read().decode('UTF-8')
g_resp = json.loads(g)
myjslist.append(g_resp)
responseitem = []
for item in myjslist:
stream = item['resource']['StreamId']
taskdetails = item['resource']['Summary_Measures']['Summary_Report']['Summary_breakdown'][0]['Task_details']
taskcost = item['resource']['Summary_Measures']['Summary_Report']['Summary_breakdown'][0]['Task_cost']
taskhours = item['resource']['Summary_Measures']['Summary_Report']['Summary_breakdown'][0]['Task_hours']
responseitem.append({'taskdetails':taskdetails, 'taskcost':taskcost, 'taskhours': taskhours})
with open('responseitem.json', 'a') as f:
json.dump(responseitem, f)
f.write("\n")