使用规范化将三重嵌套JSON扁平化

时间:2019-08-01 13:34:13

标签: python json pandas python-2.7

我正在尝试简化以下内容,但它仅适用于非三重嵌套的JSON。

工作代码:

导入json

import pandas as pd 
from pandas.io.json import json_normalize

data = [{'masterName': 'AAAAAAAAAAA',
         'shortname': 'AA',
         'info': {
              'name': 'randomka'
         },
         'mainNames': [{'date': '2019-05-16', 'NumberOne': 1111},
                       {'date': '2019-06-22', 'NumberOne': 2222}]}
       ]

result = json_normalize(data, 'mainNames', ['masterName', 'shortname',
                                          ['info', 'name']],errors='ignore')

不起作用:

data2 = [{"masterName": "AAAAAAAAAAA",
          "mainNames": [
            {
                "numbers": [{
                        "date": "2019-05-16",
                        "NumberOne": 222}],
                "name": "randomka"
            },
            {
                "numbers": [{
                        "date": "2019-05-16",
                        "NumberOne": 222}],
                "name": "randomka"
            }
        ]
    }]

    json_normalize(data2, 'mainNames', ['masterName'],errors='ignore')

它返回时:

enter image description here

我已经在record_paths代码中尝试了metasjson_normalize的替代方法,但是我无法使其适用于这种三层JSON。换句话说,我不能一口气拿走所有的专栏。

我尝试过的替代方法奏效了,看上去很近:

json_normalize(data2, ['mainNames','numbers'], ['masterName'],errors='ignore') 

输出几乎是一个Excel视图,列中有数据。根据评论请求的预期视图:

enter image description here

UPD:数据可能具有多个分支:

data2 = [{"masterName": "AAAAAAAAAAA",
          "mainNames": [
            {
                "numbers": [{
                        "date": "2019-05-16",
                        "NumberOne": 222}],
                "name": "randomka"
            },
            {
                "numbers": [{
                        "date": "2019-05-16",
                        "NumberOne": 222},
{
                        "date": "2019-07-01",
                        "NumberOne": 341}],
                "name": "randomka"
            }
        ]
    }]

1 个答案:

答案 0 :(得分:0)

正如@Aayush Mahajan在评论中所建议的那样,定义自己的函数可能更简单。这是使用Initialize sum to 0. Initialize index to 0. For every digit d from the least to most significant: If the index is even, sum += d Otherwise, sum += 10 * d ++index sum %= 11 Return sum % 11 的人:

data2

更新: 您可以添加一个内部out = [] data2 = data2[0] # Remove first level for main in data2["mainNames"]: # Iterate "mainNames" sub_dict = {"masterName": data2['mainNames']} # Init new dict (df row) with "mainNames" sub_dict.update(main["numbers"][0]) # Add all fields from "numbers" sub_dict["name"] = main["name"] # Add "name" field out.append(sub_dict) # append sud dict to list outputs print(out) # [{'masterName': 'AAAAAAAAAAA', 'date': '2019-05-16', 'NumberOne': 222, 'name': 'randomka'}, # {'masterName': 'AAAAAAAAAAA', 'date': '2019-05-16', 'NumberOne': 222, 'name': 'randomka'}] # create Dataframe with from_dict df = pd.DataFrame().from_dict(out) print(df) # masterName date NumberOne name # 0 AAAAAAAAAAA 2019-05-16 222 randomka # 1 AAAAAAAAAAA 2019-05-16 222 randomka 来遍历loop字段:

numbers

同样,它仍然可以使用初始的out = [] data2 = data2[0] # Remove first level for main in data2["mainNames"]: # Iterate "mainNames" for numbers in main["numbers"]: sub_dict = {"masterName": data2['masterName']} # Init new dict (df row) with "mainNames" sub_dict.update(numbers) # Add all fields from "numbers" sub_dict["name"] = main["name"] # Add "name" field out.append(sub_dict) # append sud dict to list outputs print(out) # [{'masterName': 'AAAAAAAAAAA', 'date': '2019-05-16', 'NumberOne': 222, 'name': 'randomka'}, # {'masterName': 'AAAAAAAAAAA', 'date': '2019-05-16', 'NumberOne': 222, 'name': 'randomka'}, # {'masterName': 'AAAAAAAAAAA', 'date': '2019-07-01', 'NumberOne': 341, 'name': 'randomka'}] df = pd.DataFrame().from_dict(out) print(df) # NumberOne date masterName name # 0 222 2019-05-16 AAAAAAAAAAA randomka # 1 222 2019-05-16 AAAAAAAAAAA randomka # 2 341 2019-07-01 AAAAAAAAAAA randomka

希望有帮助!