如何删除 json 数据中的嵌套元素以提取子嵌套数据?

时间:2021-01-14 18:49:26

标签: python json python-3.x python-requests

我有从 API 中提取的数据,该 API 以以下格式输出 json 数据。如果您注意到,有一个名为“user”的嵌套元素。当我将它导出到另一个源系统时,这个嵌套元素正在创建重复值。 我的目标是从 user 元素中提取数据(id、first_name 等)并将数据保存在 'user' 元素中。

这是 API 生成的原始 json 格式:

"enrollment_id": 12,
"content_type": "sample",
"user": {
"id": 1,
"first_name": "Sarah",
"last_name": "Kis",
"email": "s_kis@aol.com"
"campaign_name": "camp1",
"policy_acknowledged": false
"enrollment_id": 13,
"content_type": "samplee",
"user": {
"id": 2,
"first_name": "Sarahe",
"last_name": "Kiss",
"email": "s_kiss@aol.com"
"campaign_name": "camp2",
"policy_acknowledged": false


"enrollment_id": 12,
"content_type": "sample",
"id": 1,
"first_name": "Sarah",
"last_name": "Kis",
"email": "s_kis@aol.com",
"campaign_name": "camp1",
"policy_acknowledged": false
},"enrollment_id": 13,
"content_type": "samplee",
"id": 2,
"first_name": "Sarahe",
"last_name": "Kiss",
"email": "s_kiss@aol.com",
"campaign_name": "camp2",
"policy_acknowledged": false

**注意“user”元素中的数据现在是如何被提取到 json 文件中的。我知道这可能是一个简单的快速修复,但我花了几个小时试图解决这个问题,但无济于事。 **

这是我到目前为止的代码(见下文)。需要注意的是,这完全从 json 文件中删除了用户元素。不过,我想将数据保留在元素中。

 path1 = '/Users/t1_{0}.json'
 path2 = '/Users/t2_{0}.json'
 with open(path1, 'r') as the_list:
        data = json.load(the_list)
 for element in data:
        element.pop('user', None)
  with open(path2, 'w') as the_list:
        data = json.dump(data, the_list)


def load_pst_rec_data(proxy=my_proxy, api_header=api_header,
                      url=rec_url, path=my_path):

    all_psts = ['2011676', '2345729']  # List of items i am filtering in the subsequent data
    the_list = []
    s = requests.Session()  # Create API session
    s.proxies = my_proxy

    for obj in all_psts:  # Loop through the items inside the all_pst variable
        for i in range(1, 10000000):  # Due to pagination of the API, we have to loops through each page to collect data
                response = requests_retry_session(session=s). \
                    get(url + '{0}/recipients?page={1}&per_page=500'.format(obj, i), headers=api_header,
                        verify=False)  # Connect to the API
                resp = response.json()
            except Exception as e:
                print('It failed :(', e.__class__.__name__)
                print('It eventually worked', response.status_code)
                if resp:  # Consider using while resp: ______
                    the_list.extend(resp)  # Loop through results and add it to a list
                elif not resp:
                    last_page = str(i)  # Get the last page
                    print("Should stop and go to next object")
                print('process done!')

    # This section attempts to load the data collected to a json file
        print('Beginning Json process')
    except Exception as e:
        path1 = '/Users/t1_{0}.json'
        path2 = '/Users/t2_{0}.json'

        with open(path1, 'r') as the_list:
            data = json.load(the_list)

        for element in data:
            element.pop('user', None)

        with open(path2, 'w') as the_list:
            data = json.dump(data, the_list)

3 个答案:

答案 0 :(得分:2)


data = {
    "enrollment_id": 12,
    "content_type": "sample",
    "user": {
        "id": 1,
        "first_name": "Sarah",
        "last_name": "Kis",
        "email": "s_kis@aol.com"
    "campaign_name": "camp1",
    "policy_acknowledged": False

user_info = data.pop("user")

答案 1 :(得分:1)

# an example list
data = [
    {"a": 1, "x": { "b": 2, "c": 3 }},
    {"a": 4, "x": { "b": 5, "c": 6 }},

# if you want to modify it in-place (without creating a new list)
for element in data:
    # pop removes the item and returns it to you
    # if it doesn't exist, it returns None by default, but here I've asked
    # it to return an empty dictionary
    x = element.pop("x", {})
    # update the parent dictionary with all the contents of x



[{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}]



答案 2 :(得分:0)


import pandas as pd
import json

data = '''
    "enrollment_id": 12,
    "content_type": "sample",
    "user": {
        "id": 1,
        "first_name": "Sarah",
        "last_name": "Kis",
        "email": "s_kis@aol.com"
    "campaign_name": "camp1",
    "policy_acknowledged": false
}, {
    "enrollment_id": 13,
    "content_type": "samplee",
    "user": {
        "id": 2,
        "first_name": "Sarahe",
        "last_name": "Kiss",
        "email": "s_kiss@aol.com"
    "campaign_name": "camp2",
    "policy_acknowledged": false
data = json.loads(data)

# if the json format is fix
df = pd.json_normalize(data)
# save user.id, user.first_name... as the key
data_new = df.to_json(orient='records') 

# strip 'user.'
df.columns = df.columns.str.split(r'user\.').str[-1]
# if you want the type data2 is list, not str, use df.to_dict instead
data2 = df.to_dict(orient='records') 


data_new (type: str)

    "enrollment_id": 12,
    "content_type": "sample",
    "campaign_name": "camp1",
    "policy_acknowledged": false,
    "user.id": 1,
    "user.first_name": "Sarah",
    "user.last_name": "Kis",
    "user.email": "s_kis@aol.com"
    "enrollment_id": 13,
    "content_type": "samplee",
    "campaign_name": "camp2",
    "policy_acknowledged": false,
    "user.id": 2,
    "user.first_name": "Sarahe",
    "user.last_name": "Kiss",
    "user.email": "s_kiss@aol.com"

data2 (type: list)

[{'enrollment_id': 12,
  'content_type': 'sample',
  'campaign_name': 'camp1',
  'policy_acknowledged': False,
  'id': 1,
  'first_name': 'Sarah',
  'last_name': 'Kis',
  'email': 's_kis@aol.com'},
 {'enrollment_id': 13,
  'content_type': 'samplee',
  'campaign_name': 'camp2',
  'policy_acknowledged': False,
  'id': 2,
  'first_name': 'Sarahe',
  'last_name': 'Kiss',
  'email': 's_kiss@aol.com'}]