我目前无法解析来自HTTP API调用的深度嵌套的JSON响应。
我的JSON响应就像
{'took': 476,
'_revision': 'r08badf3',
'response': {'accounts': {'hits': [{'name': '4002238760',
'display_name': 'Googleglass-4002238760',
'selected_fields': ['Googleglass',
'DDMonkey',
'Papu New Guinea',
'Jonathan Vardharajan',
'4002238760',
'DDMadarchod-INSTE',
None,
'Googleglass',
'0001012556',
'CC',
'Setu Non Standard',
'40022387',
320142,
4651321321333,
1324650651651]},
{'name': '4003893720',
'display_name': 'Swift-4003893720',
'selected_fields': ['Swift',
'DDMonkey',
'Papu New Guinea',
'Jonathan Vardharajan',
'4003893720',
'DDMadarchod-UPTM-RemotexNBD',
None,
'S.W.I.F.T. SCRL',
'0001000110',
'SE',
'Setu Non Standard',
'40038937',
189508,
1464739200000,
1559260800000]},
收到响应后,我将使用json normalize将其存储在数据对象中
data = response.json()
data = data['response']['accounts']['hits']
data = json_normalize(data)
但是,我标准化后,数据框看起来像this
我的Curl语句看起来像这样
curl --data 'query= {"terms":[{"type":"string_attribute","attribute":"Account Type","query_term_id":"account_type","in_list":["Contract"]},{"type":"string","term":"status_group","in_list":["paying"]},{"type":"string_attribute","attribute":"Region","in_list":["DDEU"]},{"type":"string_attribute","attribute":"Country","in_list":["Belgium"]},{"type":"string_attribute","attribute":"CSM Tag","in_list":["EU CSM"]},{"type":"date_attribute","attribute":"Contract Renewal Date","gte":1554057000000,"lte":1561833000000}],"count":1000,"offset":0,"fields":[{"type":"string_attribute","attribute":"DomainName","field_display_name":"Client Name"},{"type":"string_attribute","attribute":"Region","field_display_name":"Region"},{"type":"string_attribute","attribute":"Country","field_display_name":"Country"},{"type":"string_attribute","attribute":"Success Manager","field_display_name":"Client Success Manager"},{"type":"string","term":"identifier","field_display_name":"Account id"},{"type":"string_attribute","attribute":"DeviceSLA","field_display_name":"[FIN] Material Part Number"},{"type":"string_attribute","attribute":"SFDCAccountId","field_display_name":"SFDCAccountId"},{"type":"string_attribute","attribute":"Client","field_display_name":"[FIN] Client Sold-To Name"},{"type":"string_attribute","attribute":"Sold To Code","field_display_name":"[FIN] Client Sold To Code"},{"type":"string_attribute","attribute":"BU","field_display_name":"[FIN] Active BUs"},{"type":"string_attribute","attribute":"Service Type","field_display_name":"[FIN] Service Type"},{"type":"string_attribute","attribute":"Contract Header ID","field_display_name":"[FIN] SAP Contract Header ID"},{"type":"number_attribute","attribute":"Contract Value","field_display_name":"[FIN] ACV - Annual Contract Value","desc":true},{"type":"date_attribute","attribute":"Contract Start Date","field_display_name":"[FIN] Contract Start Date"},{"type":"date_attribute","attribute":"Contract Renewal Date","field_display_name":"[FIN] Contract Renewal Date"}],"scope":"all"}' --header 'app-token:YOUR-TOKEN-HERE' 'https://app.totango.com/api/v1/search/accounts'
所以最终我想将响应与字段名称一起存储在数据框中。
答案 0 :(得分:0)
过去,我不得不做几次这样的事情(拼出一个嵌套的json),我将解释我的过程,然后您可以查看它是否有效,或者至少可以正常工作。满足您的需求。
1)采取了data
响应,并使用函数将其完全展平。当我第一次这样做时,这个blog很有帮助。
2)然后遍历所创建的平面字典,以通过嵌套部分中新键名称的编号来查找需要在何处创建每一行和每一列。还有一些键是唯一的/不同的,因此它们没有数字来标识为“新”行,因此我在我称为special_cols
的键中进行了说明。
3)在遍历这些行时,拉出指定的行号(嵌入在这些平面键中),然后以这种方式构造数据帧。
这听起来很复杂,但是如果您逐行调试和运行,则可以看到它是如何工作的。尽管如此,我相信它可以满足您的需求。
data = {'took': 476,
'_revision': 'r08badf3',
'response': {'accounts': {'hits': [{'name': '4002238760',
'display_name': 'Googleglass-4002238760',
'selected_fields': ['Googleglass',
'DDMonkey',
'Papu New Guinea',
'Jonathan Vardharajan',
'4002238760',
'DDMadarchod-INSTE',
None,
'Googleglass',
'0001012556',
'CC',
'Setu Non Standard',
'40022387',
320142,
4651321321333,
1324650651651]},
{'name': '4003893720',
'display_name': 'Swift-4003893720',
'selected_fields': ['Swift',
'DDMonkey',
'Papu New Guinea',
'Jonathan Vardharajan',
'4003893720',
'DDMadarchod-UPTM-RemotexNBD',
None,
'S.W.I.F.T. SCRL',
'0001000110',
'SE',
'Setu Non Standard',
'40038937',
189508,
1464739200000,
1559260800000]}]}}}
import pandas as pd
import re
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data)
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = column.replace('_', '')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
输出:
print (results.to_string())
name displayname selectedfields0 selectedfields1 selectedfields2 selectedfields3 selectedfields4 selectedfields5 selectedfields6 selectedfields7 selectedfields8 selectedfields9 selectedfields10 selectedfields11 selectedfields12 selectedfields13 selectedfields14 took _revision
0 4002238760 Googleglass-4002238760 Googleglass DDMonkey Papu New Guinea Jonathan Vardharajan 4002238760 DDMadarchod-INSTE NaN Googleglass 0001012556 CC Setu Non Standard 40022387 320142.0 4.651321e+12 1.324651e+12 476 r08badf3
1 4003893720 Swift-4003893720 Swift DDMonkey Papu New Guinea Jonathan Vardharajan 4003893720 DDMadarchod-UPTM-RemotexNBD NaN S.W.I.F.T. SCRL 0001000110 SE Setu Non Standard 40038937 189508.0 1.464739e+12 1.559261e+12 476 r08badf3