如何使用字典列表创建分层数据框架

时间:2020-10-12 05:34:31

标签: python json pandas dictionary for-loop

我有以下要使用python扁平化的词典列表。数据最初来自xero,如下所示:

这是我使用API​​提取的示例数据:

my_dict = [{'RowType': 'Section', 'Title': 'Income', 'Rows': []},{'RowType': 'Section', 'Title': 'Income from Rents', 'Rows': []},
 {'RowType': 'Section',
  'Title': 'Rent Received',
  'Rows': [{'RowType': 'Row',
    'Cells': [{'Value': 'Contract Rent',
      'Attributes': [{'Value': '5',
        'Id': 'account'},
       {'Value': '5', 'Id': 'groupID'}]},
     {'Value': '721093.92',
      'Attributes': [{'Value': '5',
        'Id': 'account'},
       {'Value': '5', 'Id': 'groupID'}]}]},
   {'RowType': 'Row',
    'Cells': [{'Value': 'Rent  - Carparks',
      'Attributes': [{'Value': '95',
        'Id': 'account'}]},
     {'Value': '3523.33',
      'Attributes': [{'Value': '95',
        'Id': 'account'}]}]},
   {'RowType': 'Row',
    'Cells': [{'Value': 'Vacant Tenancies',
      'Attributes': [{'Value': '53',
        'Id': 'account'}]},
     {'Value': '-22226.50',
      'Attributes': [{'Value': '53',
        'Id': 'account'}]}]},
   {'RowType': 'SummaryRow',
    'Cells': [{'Value': 'Total Rent Received'}, {'Value': '702390.75'}]}]},
 {'RowType': 'Section',
  'Title': 'Rent Reductions',
  'Rows': [{'RowType': 'Row',
    'Cells': [{'Value': 'COVID-19 Rent reduction',
      'Attributes': [{'Value': '40',
        'Id': 'account'}]},
     {'Value': '-132478.03',
      'Attributes': [{'Value': '40',
        'Id': 'account'}]}]},
   {'RowType': 'Row',
    'Cells': [{'Value': 'Rent Holiday',
      'Attributes': [{'Value': '4d',
        'Id': 'account'}]},

         {'Value': '-14451.58',
          'Attributes': [{'Value': '4d',
            'Id': 'account'}]}]},
       {'RowType': 'SummaryRow',
        'Cells': [{'Value': 'Total Rent Reductions'}, {'Value': '-146929.61'}]}]}]

所需的输出如下:

          Name        Amount    Hierarchy_level_3   Hierarchy_level_1   Hierarchy_level_2
0   Contract Rent   721093.92   Rent Received            Income        Income from Rents
1   Rent - Carparks 3523.33     Rent Receive             Income        Income from Rents
2   Vacant Tenancies -22226.50  Rent Received            Income        Income from Rents
3   Total Rent Received 702390.75           
4   COVID-19 Rent reduction -132478.03  Rent Reduction   Income        Income from Rents
     .                .              .                       .          .          .
     .                .              .                       .          .           .

有人可以帮助我实现这一目标吗?这里的示例数据是我从api获取的格式。不确定如何展平此文件。我对Python较新。

1 个答案:

答案 0 :(得分:2)

假设示例中第Hierarchy_level_3行的4Rent Received而不是Rent Reduction,并且示例中具有第4级层次结构,这是一个解决方案。我添加了级别编号和级别名称,因为我认为它们可能比“层次结构级别”更有用,但是可以随时删除

import pandas as pd
hierarchy = {f'Hierarchy_level_{i+1}': d['Title'] for i, d in enumerate(my_dict)}
all_data = []

for level, d in enumerate(my_dict):
    for row in d['Rows']:
        cells = row['Cells']
        all_data.append({
            'Name': cells[0]['Value'],
            'Amount': cells[1]['Value'],
            'Level': level,
            'Level_name': hierarchy[f'Hierarchy_level_{level+1}'],
            **hierarchy
        })
df = pd.DataFrame(all_data)

输出:

                   Name      Amount  Level       Level_name Hierarchy_level_1  Hierarchy_level_2 Hierarchy_level_3 Hierarchy_level_4
0            Contract Rent   721093.92      2    Rent Received            Income  Income from Rents     Rent Received   Rent Reductions
1         Rent  - Carparks     3523.33      2    Rent Received            Income  Income from Rents     Rent Received   Rent Reductions
2         Vacant Tenancies   -22226.50      2    Rent Received            Income  Income from Rents     Rent Received   Rent Reductions
3      Total Rent Received   702390.75      2    Rent Received            Income  Income from Rents     Rent Received   Rent Reductions
4  COVID-19 Rent reduction  -132478.03      3  Rent Reductions            Income  Income from Rents     Rent Received   Rent Reductions
5             Rent Holiday   -14451.58      3  Rent Reductions            Income  Income from Rents     Rent Received   Rent Reductions
6    Total Rent Reductions  -146929.61      3  Rent Reductions            Income  Income from Rents     Rent Received   Rent Reductions

-编辑 由于只需要3个层次级别:

import pandas as pd
hierarchy = {f'Hierarchy_level_{i+1}': d['Title'] for i, d in enumerate(my_dict)}
all_data = []

for level, d in enumerate(my_dict):
    for row in d['Rows']:
        cells = row['Cells']
        all_data.append({
            'Name': cells[0]['Value'],
            'Amount': cells[1]['Value'],
            'Hierarchy_level_1': hierarchy[f'Hierarchy_level_1'],
            'Hierarchy_level_2': hierarchy[f'Hierarchy_level_2'],
            'Hierarchy_level_3': hierarchy[f'Hierarchy_level_{level+1}'],
        })
df = pd.DataFrame(all_data)

输出:

Name      Amount Hierarchy_level_1  Hierarchy_level_2 Hierarchy_level_3
0            Contract Rent   721093.92            Income  Income from Rents     Rent Received
1         Rent  - Carparks     3523.33            Income  Income from Rents     Rent Received
2         Vacant Tenancies   -22226.50            Income  Income from Rents     Rent Received
3      Total Rent Received   702390.75            Income  Income from Rents     Rent Received
4  COVID-19 Rent reduction  -132478.03            Income  Income from Rents   Rent Reductions
5             Rent Holiday   -14451.58            Income  Income from Rents   Rent Reductions
6    Total Rent Reductions  -146929.61            Income  Income from Rents   Rent Reductions