将复杂/嵌套的JSON转换为DataFrame

时间:2020-06-10 12:27:51

标签: python json pandas api dataframe

我有一个复杂/嵌套的JSON,我需要将其转换为DataFrame(Python)。我可以得到第一部分,但是我正在努力解决第二部分。

import requests
from pandas.io.json import json_normalize
import json

url = 'url'

headers = {'api-key':'key'}

resp = requests.get(url, headers = headers)
print(resp.status_code)

r = resp.content
r

responses = json.loads(r.decode('utf-8'))
responses

输出(响应)

{'count': 39,
 'requestAt': '2020-06-09T20:10:23.201+00:00',
 'data': {'Id1': {'id': 'Id1',
   'groupId': '1',
   'label': 'Question 1',
   'options': {'1_1': {'id': '1_1',
     'prefix': 'A',
     'label': 'Alternative A',
     'isCorrect': True},
    '1_2': {'id': '1_2',
     'prefix': 'B',
     'label': 'Alternative B',
     'isCorrect': False},
    '1_3': {'id': '1_3',
     'prefix': 'C',
     'label': 'Alternative C',
     'isCorrect': False}}}}}
df = DataFrame(responses['data'])
df.T

输出(DataFrame.T):

+-----+---------+------------+-------------+
| id  | groupId |   label    | options     |
+-----+---------+------------+-------------+
| Id1 |       1 | Question 1 | **JSON 2**  |
+-----+---------+------------+-------------+
 **JSON 2** (all inside the cell above)
{'1_1': {'id': '1_1',
     'prefix': 'A',
     'label': 'Alternative A',
     'isCorrect': True},
    '1_2': {'id': '1_2',
     'prefix': 'B',
     'label': 'Alternative B',
     'isCorrect': False},
    '1_3': {'id': '1_3',
     'prefix': 'C',
     'label': 'Alternative C',
     'isCorrect': False}}

我也需要将JSON 2打开到DataFrame中。

所需的输出:

+-----+---------+------------+--------+---------------+-----------+
| id  | groupId |   label    | prefix |     label     | isCorrect |
+-----+---------+------------+--------+---------------+-----------+
| Id1 |       1 | Question 1 | A      | Alternative A | True      |
| Id1 |       1 | Question 1 | B      | Alternative B | False     |
| Id1 |       1 | Question 1 | C      | Alternative C | False     |
+-----+---------+------------+--------+---------------+-----------+

如何获得所需的输出?谢谢。

1 个答案:

答案 0 :(得分:1)

这是一种实现方法:

import pandas as pd 

responses = {
    'count': 39,
    'requestAt': '2020-06-09T20:10:23.201+00:00',
    'data': {
        'Id1': {
            'id': 'Id1',
            'groupId': '1',
            'label': 'Question 1',
            'options': {
                '1_1': {
                    'id': '1_1',
                    'prefix': 'A',
                    'label': 'Alternative A',
                    'isCorrect': True},
                '1_2': {
                    'id': '1_2',
                    'prefix': 'B',
                    'label': 'Alternative B',
                    'isCorrect': False},
                '1_3': {
                    'id': '1_3',
                    'prefix': 'C',
                    'label': 'Alternative C',
                    'isCorrect': False}
            }
        }
    }
}


# refactor response to a list of dicts
# where each item is a dictionary of keys and values 
# corresponding to a single row of dataframe
response_list = []

for id in responses['data']:

    # get the keys of interest
    data = {k: v for k, v in responses['data'][id].items() if k in ['id', 'groupId', 'label']}

    # lets rename 'label' key as deeper inside the json there's another key named 'label'
    # lets not have two columns named the same inside the dataframe
    data['label_'] = data.pop('label')

    # dig deeper inside the current id
    for key in responses['data'][id]['options']:

        # get the keys of interest
        inner_data = {k: v for k, v in responses['data'][id]['options'][key].items() if k in ['prefix', 'label', 'isCorrect']}

        # combine the two dicts and append it to the final list
        response_list.append({**data, **inner_data})

print(pd.DataFrame(response_list))

这是输出:

    id groupId      label_ prefix          label  isCorrect
0  Id1       1  Question 1      A  Alternative A       True
1  Id1       1  Question 1      B  Alternative B      False
2  Id1       1  Question 1      C  Alternative C      False