JSON_Normalize带有列表的熊猫

时间:2020-04-26 23:27:40

标签: python json pandas dataframe

我有一个使用以下嵌套格式设置的JSON文件。

[
  {
    "unitCode": "ABCD",
    "bedType": "Adult MT/MS",
    "census": 13,
    "subCensus": null,
    "censusDetails": [],
    "occupancy": 62,
    "occupancyStar": null,
    "occupancyAlertStatus": null,
    "columns": [
      {
        "id": "blockedBeds",
        "value": "1",
        "hoverDetails": [
          {
            "id": "bedName",
            "value": "23_1"
          }
        ]
      },
      {
        "id": "unOccupied",
        "value": "2",
        "hoverDetails": [
          {
            "id": "bedName",
            "value": "20a_2"
          },
          {
            "id": "bedName",
            "value": "22a_1"
          }
        ]
      }
    ],
    "codeEvents": null,
    "codeEventDetails": null
  },
  {
    "unitCode": "EFGH",
    "bedType": "Adult MT/MS",
    "census": 14,
    "subCensus": null,
    "censusDetails": [],
    "occupancy": 61,
    "occupancyStar": null,
    "occupancyAlertStatus": null,
    "columns": [
      {
        "id": "blockedBeds",
        "value": "1",
        "hoverDetails": [
          {
            "id": "bedName",
            "value": "52_2"
          }
        ]
      },
      {
        "id": "unOccupied",
        "value": "1",
        "hoverDetails": [
          {
            "id": "bedName",
            "value": "53_1"
          }
        ]
      }
    ],
    "codeEvents": null,
    "codeEventDetails": null
  }
]

我正在尝试展平该文件,并使用json_normalize将其转换为数据帧。 这是我的代码: testhover = json_normalize(data, ['columns'],['unitCode'])

我得到的数据帧如下:

    id          | value |   hoverDetails                                       | unitCode
0   blockedBeds | 1     |   [{'id': 'bedName', 'value': '23_1'}]               | ABCD
1   unOccupied  | 2     |   [{'id': 'bedName', 'value': '20a_2'}, {'id': '...' | ABCD
2   blockedBeds | 1     |   [{'id': 'bedName', 'value': '52_2'}]               | EFGH
3   unOccupied  | 1     |   [{'id': 'bedName', 'value': '53_1'}]               | EFGH

我需要以下格式:

    blockedBeds   |  unOccupied  |   unitCode
0 | '23_1'        |  NaN         |   ABCD
1 | NaN           |  '20a_2'     |   ABCD
2 | NaN           |  '22a_1'     |   ABCD
3 | '52_2'        |  NaN         |   EFGH
4 | NaN           |  '53_1'      |   EFGH

我似乎无法获取嵌套床数据。 我非常感谢您的帮助。

1 个答案:

答案 0 :(得分:3)

您应该从循环中创建字典列表,并使用该列表创建数据框。

vals = []

for item in parsed_json:
    unit_code = item['unitCode']
    for col in item['columns']:
        for hd in col['hoverDetails']:
            vals.append({'unitCode': unit_code,
                          col['id']: hd['value']})

pd.DataFrame(vals)

输出

  unitCode blockedBeds unOccupied
0     ABCD        23_1        NaN
1     ABCD         NaN      20a_2
2     ABCD         NaN      22a_1
3     EFGH        52_2        NaN
4     EFGH         NaN       53_1