将深层嵌套的JSON响应从API调用转换为pandas数据框

时间:2019-03-19 10:19:40

标签: python json pandas api response

我目前无法解析来自HTTP API调用的深度嵌套的JSON响应。

我的JSON响应就像

{'took': 476,
 '_revision': 'r08badf3',
 'response': {'accounts': {'hits': [{'name': '4002238760',
     'display_name': 'Googleglass-4002238760',
     'selected_fields': ['Googleglass',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4002238760',
      'DDMadarchod-INSTE',
      None,
      'Googleglass',
      '0001012556',
      'CC',
      'Setu Non Standard',
      '40022387',
      320142,
      4651321321333,
      1324650651651]},
    {'name': '4003893720',
     'display_name': 'Swift-4003893720',
     'selected_fields': ['Swift',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4003893720',
      'DDMadarchod-UPTM-RemotexNBD',
      None,
      'S.W.I.F.T. SCRL',
      '0001000110',
      'SE',
      'Setu Non Standard',
      '40038937',
      189508,
      1464739200000,
      1559260800000]},

收到响应后,我将使用json normalize将其存储在数据对象中

data = response.json()
data = data['response']['accounts']['hits']
data = json_normalize(data)

但是,我标准化后,数据框看起来像this

我的Curl语句看起来像这样

curl --data 'query= {"terms":[{"type":"string_attribute","attribute":"Account Type","query_term_id":"account_type","in_list":["Contract"]},{"type":"string","term":"status_group","in_list":["paying"]},{"type":"string_attribute","attribute":"Region","in_list":["DDEU"]},{"type":"string_attribute","attribute":"Country","in_list":["Belgium"]},{"type":"string_attribute","attribute":"CSM Tag","in_list":["EU CSM"]},{"type":"date_attribute","attribute":"Contract Renewal Date","gte":1554057000000,"lte":1561833000000}],"count":1000,"offset":0,"fields":[{"type":"string_attribute","attribute":"DomainName","field_display_name":"Client Name"},{"type":"string_attribute","attribute":"Region","field_display_name":"Region"},{"type":"string_attribute","attribute":"Country","field_display_name":"Country"},{"type":"string_attribute","attribute":"Success Manager","field_display_name":"Client Success Manager"},{"type":"string","term":"identifier","field_display_name":"Account id"},{"type":"string_attribute","attribute":"DeviceSLA","field_display_name":"[FIN] Material Part Number"},{"type":"string_attribute","attribute":"SFDCAccountId","field_display_name":"SFDCAccountId"},{"type":"string_attribute","attribute":"Client","field_display_name":"[FIN] Client Sold-To Name"},{"type":"string_attribute","attribute":"Sold To Code","field_display_name":"[FIN] Client Sold To Code"},{"type":"string_attribute","attribute":"BU","field_display_name":"[FIN] Active BUs"},{"type":"string_attribute","attribute":"Service Type","field_display_name":"[FIN] Service Type"},{"type":"string_attribute","attribute":"Contract Header ID","field_display_name":"[FIN] SAP Contract Header ID"},{"type":"number_attribute","attribute":"Contract Value","field_display_name":"[FIN] ACV - Annual Contract Value","desc":true},{"type":"date_attribute","attribute":"Contract Start Date","field_display_name":"[FIN] Contract Start Date"},{"type":"date_attribute","attribute":"Contract Renewal Date","field_display_name":"[FIN] Contract Renewal Date"}],"scope":"all"}' --header 'app-token:YOUR-TOKEN-HERE' 'https://app.totango.com/api/v1/search/accounts'

所以最终我想将响应与字段名称一起存储在数据框中。

1 个答案:

答案 0 :(得分:0)

过去,我不得不做几次这样的事情(拼出一个嵌套的json),我将解释我的过程,然后您可以查看它是否有效,或者至少可以正常工作。满足您的需求。

1)采取了data响应,并使用函数将其完全展平。当我第一次这样做时,这个blog很有帮助。

2)然后遍历所创建的平面字典,以通过嵌套部分中新键名称的编号来查找需要在何处创建每一行和每一列。还有一些键是唯一的/不同的,因此它们没有数字来标识为“新”行,因此我在我称为special_cols的键中进行了说明。

3)在遍历这些行时,拉出指定的行号(嵌入在这些平面键中),然后以这种方式构造数据帧。

这听起来很复杂,但是如果您逐行调试和运行,则可以看到它是如何工作的。尽管如此,我相信它可以满足您的需求。

data = {'took': 476,
 '_revision': 'r08badf3',
 'response': {'accounts': {'hits': [{'name': '4002238760',
     'display_name': 'Googleglass-4002238760',
     'selected_fields': ['Googleglass',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4002238760',
      'DDMadarchod-INSTE',
      None,
      'Googleglass',
      '0001012556',
      'CC',
      'Setu Non Standard',
      '40022387',
      320142,
      4651321321333,
      1324650651651]},
    {'name': '4003893720',
     'display_name': 'Swift-4003893720',
     'selected_fields': ['Swift',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4003893720',
      'DDMadarchod-UPTM-RemotexNBD',
      None,
      'S.W.I.F.T. SCRL',
      '0001000110',
      'SE',
      'Setu Non Standard',
      '40038937',
      189508,
      1464739200000,
      1559260800000]}]}}}


import pandas as pd
import re


def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

flat = flatten_json(data)                      


results = pd.DataFrame()
special_cols = []

columns_list = list(flat.keys())
for item in columns_list:
    try:
        row_idx = re.findall(r'\_(\d+)\_', item )[0]
    except:
        special_cols.append(item)
        continue
    column = re.findall(r'\_\d+\_(.*)', item )[0]
    column = column.replace('_', '')

    row_idx = int(row_idx)
    value = flat[item]

    results.loc[row_idx, column] = value

for item in special_cols:
    results[item] = flat[item]

输出:

print (results.to_string())
         name             displayname selectedfields0 selectedfields1  selectedfields2       selectedfields3 selectedfields4              selectedfields5  selectedfields6  selectedfields7 selectedfields8 selectedfields9   selectedfields10 selectedfields11  selectedfields12  selectedfields13  selectedfields14  took _revision
0  4002238760  Googleglass-4002238760     Googleglass        DDMonkey  Papu New Guinea  Jonathan Vardharajan      4002238760            DDMadarchod-INSTE              NaN      Googleglass      0001012556              CC  Setu Non Standard         40022387          320142.0      4.651321e+12      1.324651e+12   476  r08badf3
1  4003893720        Swift-4003893720           Swift        DDMonkey  Papu New Guinea  Jonathan Vardharajan      4003893720  DDMadarchod-UPTM-RemotexNBD              NaN  S.W.I.F.T. SCRL      0001000110              SE  Setu Non Standard         40038937          189508.0      1.464739e+12      1.559261e+12   476  r08badf3