Question

我有从 API 中提取的数据，该 API 以以下格式输出 json 数据。如果您注意到，有一个名为“user”的嵌套元素。当我将它导出到另一个源系统时，这个嵌套元素正在创建重复值。 我的目标是从 user 元素中提取数据（id、first_name 等）并将数据保存在 'user' 元素中。

这是 API 生成的原始 json 格式：

[{
"enrollment_id": 12,
"content_type": "sample",
"user": {
"id": 1,
"first_name": "Sarah",
"last_name": "Kis",
"email": "s_kis@aol.com"
},
"campaign_name": "camp1",
"policy_acknowledged": false
    },
"enrollment_id": 13,
"content_type": "samplee",
"user": {
"id": 2,
"first_name": "Sarahe",
"last_name": "Kiss",
"email": "s_kiss@aol.com"
},
"campaign_name": "camp2",
"policy_acknowledged": false
}]

这是我想要的输出或类似的东西：

 [{
"enrollment_id": 12,
"content_type": "sample",
"id": 1,
"first_name": "Sarah",
"last_name": "Kis",
"email": "s_kis@aol.com",
"campaign_name": "camp1",
"policy_acknowledged": false
},"enrollment_id": 13,
"content_type": "samplee",
"id": 2,
"first_name": "Sarahe",
"last_name": "Kiss",
"email": "s_kiss@aol.com",
"campaign_name": "camp2",
"policy_acknowledged": false
}]

**注意“user”元素中的数据现在是如何被提取到 json 文件中的。我知道这可能是一个简单的快速修复，但我花了几个小时试图解决这个问题，但无济于事。 **

这是我到目前为止的代码（见下文）。需要注意的是，这完全从 json 文件中删除了用户元素。不过，我想将数据保留在元素中。

 path1 = '/Users/t1_{0}.json'
 path2 = '/Users/t2_{0}.json'
    
 with open(path1, 'r') as the_list:
        data = json.load(the_list)
    
 for element in data:
        element.pop('user', None)
    
  with open(path2, 'w') as the_list:
        data = json.dump(data, the_list)

这是我的完整代码供参考：

def load_pst_rec_data(proxy=my_proxy, api_header=api_header,
                      url=rec_url, path=my_path):

    all_psts = ['2011676', '2345729']  # List of items i am filtering in the subsequent data
    the_list = []
    s = requests.Session()  # Create API session
    s.proxies = my_proxy

    for obj in all_psts:  # Loop through the items inside the all_pst variable
        for i in range(1, 10000000):  # Due to pagination of the API, we have to loops through each page to collect data
            try:
                response = requests_retry_session(session=s). \
                    get(url + '{0}/recipients?page={1}&per_page=500'.format(obj, i), headers=api_header,
                        verify=False)  # Connect to the API
                resp = response.json()
            except Exception as e:
                print('It failed :(', e.__class__.__name__)
            else:
                print('It eventually worked', response.status_code)
                if resp:  # Consider using while resp: ______
                    the_list.extend(resp)  # Loop through results and add it to a list
                elif not resp:
                    last_page = str(i)  # Get the last page
                    print("Should stop and go to next object")
                    break
            finally:
                print('process done!')

    # This section attempts to load the data collected to a json file
    try:
        print('Beginning Json process')
    except Exception as e:
        print(e)
    else:
        path1 = '/Users/t1_{0}.json'
        path2 = '/Users/t2_{0}.json'

        with open(path1, 'r') as the_list:
            data = json.load(the_list)

        for element in data:
            element.pop('user', None)

        with open(path2, 'w') as the_list:
            data = json.dump(data, the_list)

Answer 1

那个数据结构是固定的吗？就像您正在尝试解决这个特定问题并且不需要更灵活的解决方案一样？

data = {
    "enrollment_id": 12,
    "content_type": "sample",
    "user": {
        "id": 1,
        "first_name": "Sarah",
        "last_name": "Kis",
        "email": "s_kis@aol.com"
    },
    "campaign_name": "camp1",
    "policy_acknowledged": False
}

user_info = data.pop("user")
data.update(user_info)

Answer 2

# an example list
data = [
    {"a": 1, "x": { "b": 2, "c": 3 }},
    {"a": 4, "x": { "b": 5, "c": 6 }},
]

# if you want to modify it in-place (without creating a new list)
for element in data:
    # pop removes the item and returns it to you
    # if it doesn't exist, it returns None by default, but here I've asked
    # it to return an empty dictionary
    x = element.pop("x", {})
    # update the parent dictionary with all the contents of x
    element.update(x)

print(data)

输出：

[{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}]

在您的情况下，将“x”替换为“user”。

看看dictionary.pop和dictionary.update

Answer 3

如果用pandas来处理，效率可能不高，但可以让代码更容易阅读。

import pandas as pd
import json

data = '''
[{
    "enrollment_id": 12,
    "content_type": "sample",
    "user": {
        "id": 1,
        "first_name": "Sarah",
        "last_name": "Kis",
        "email": "s_kis@aol.com"
    },
    "campaign_name": "camp1",
    "policy_acknowledged": false
}, {
    "enrollment_id": 13,
    "content_type": "samplee",
    "user": {
        "id": 2,
        "first_name": "Sarahe",
        "last_name": "Kiss",
        "email": "s_kiss@aol.com"
    },
    "campaign_name": "camp2",
    "policy_acknowledged": false
}]
'''
data = json.loads(data)

# if the json format is fix
df = pd.json_normalize(data)
# save user.id, user.first_name... as the key
data_new = df.to_json(orient='records') 


# strip 'user.'
df.columns = df.columns.str.split(r'user\.').str[-1]
# if you want the type data2 is list, not str, use df.to_dict instead
data2 = df.to_dict(orient='records')

输出：

data_new (type: str)

[
  {
    "enrollment_id": 12,
    "content_type": "sample",
    "campaign_name": "camp1",
    "policy_acknowledged": false,
    "user.id": 1,
    "user.first_name": "Sarah",
    "user.last_name": "Kis",
    "user.email": "s_kis@aol.com"
  },
  {
    "enrollment_id": 13,
    "content_type": "samplee",
    "campaign_name": "camp2",
    "policy_acknowledged": false,
    "user.id": 2,
    "user.first_name": "Sarahe",
    "user.last_name": "Kiss",
    "user.email": "s_kiss@aol.com"
  }
]


data2 (type: list)

[{'enrollment_id': 12,
  'content_type': 'sample',
  'campaign_name': 'camp1',
  'policy_acknowledged': False,
  'id': 1,
  'first_name': 'Sarah',
  'last_name': 'Kis',
  'email': 's_kis@aol.com'},
 {'enrollment_id': 13,
  'content_type': 'samplee',
  'campaign_name': 'camp2',
  'policy_acknowledged': False,
  'id': 2,
  'first_name': 'Sarahe',
  'last_name': 'Kiss',
  'email': 's_kiss@aol.com'}]

如何删除 json 数据中的嵌套元素以提取子嵌套数据？

3 个答案: