如何使用json_normalize创建带有嵌套JSON的DataFrame

时间:2019-11-08 21:27:17

标签: pandas dataframe

所以我有这个嵌套的JSON:

 {
        "shipmentDate": "2019-10-14T14:00:35+02:00",
        "shipmentId": 683160924,
        "shipmentItems": [
            {
                "orderId": "2596035410",
                "orderItemId": "BFC0000318171522"
            }
        ],
        "shipmentReference": "081234500868579440",
        "transport": {
            "transportId": 422147262
        }
    },
    {
        "shipmentDate": "2019-10-14T00:51:03+02:00",
        "shipmentId": 683020323,
        "shipmentItems": [
            {
                "orderId": "2595582210",
                "orderItemId": "BFC0000318038054"
            }
        ],
        "shipmentReference": "081234500867544944",
        "transport": {
            "transportId": 422001974
        }
    }

我用它来获取数据框:

parsed_data = json.loads(r.text)
d = json_normalize(parsed_data['shipments'])


print(d.head())

我的输出:

   shipmentId               shipmentDate   shipmentReference                                      shipmentItems  transport.transportId
0   689165626  2019-11-08T18:57:31+01:00  081234500924235822  [{'orderItemId': 'BFC0000331613400', 'orderId'...              428363308
1   689125502  2019-11-08T16:30:02+01:00  081234500923779723  [{'orderItemId': 'BFC0000331548600', 'orderId'...              428321764
2   689109783  2019-11-08T15:28:32+01:00  081234500923650213  [{'orderItemId': 'BFC0000331516105', 'orderId'...              428305148
3   689053625  2019-11-08T11:56:32+01:00  081234500923108493  [{'orderItemId': 'BFC0000331462628', 'orderId'...              428245727
4   689053493  2019-11-08T11:56:02+01:00  081234500923108813  [{'orderItemId': 'BFC0000331459706', 'orderId'...              428245587

但是,shippItems现在仍显示嵌套的JSON。像运输专栏一样,如何获得shipmentItems.orderIdshipmentItems.orderItemId的两列?

1 个答案:

答案 0 :(得分:1)

您可以尝试一下,

data =  [
  {
    "shipmentDate": "2019-10-14T14:00:35+02:00",
    "shipmentId": 683160924,
    "shipmentItems": [
      {
        "orderId": "2596035410",
        "orderItemId": "BFC0000318171522"
      }
    ],
    "shipmentReference": "081234500868579440",
    "transport": {
      "transportId": 422147262
    }
  },
  {
    "shipmentDate": "2019-10-14T00:51:03+02:00",
    "shipmentId": 683020323,
    "shipmentItems": [
      {
        "orderId": "2595582210",
        "orderItemId": "BFC0000318038054"
      }
    ],
    "shipmentReference": "081234500867544944",
    "transport": {
      "transportId": 422001974
    }
  }
]
from pandas.io.json import json_normalize

columns = [
    'shipmentDate', 
    'shipmentId', 
    'shipmentReference', 
    ['transport', 'transportId']
]


df = json_normalize(data, 'shipmentItems', columns)

这是结果,

>>> data
[{'shipmentDate': '2019-10-14T14:00:35+02:00', 'shipmentId': 683160924, 'shipmentItems': [{'orderId': '2596035410', 'orderItemId': 'BFC0000318171522'}], 'shipmentReference': '081234500868579440', 'transport': {'transportId': 422147262}}, {'shipmentDate': '2019-10-14T00:51:03+02:00', 'shipmentId': 683020323, 'shipmentItems': [{'orderId': '2595582210', 'orderItemId': 'BFC0000318038054'}], 'shipmentReference': '081234500867544944', 'transport': {'transportId': 422001974}}]
>>> columns = [
    'shipmentDate',
    'shipmentId',
    'shipmentReference',
    ['transport', 'transportId']
]... ... ... ... ...
>>> json_normalize(data, 'shipmentItems', columns)
      orderId       orderItemId               shipmentDate shipmentId  \
0  2596035410  BFC0000318171522  2019-10-14T14:00:35+02:00  683160924
1  2595582210  BFC0000318038054  2019-10-14T00:51:03+02:00  683020323

    shipmentReference transport.transportId
0  081234500868579440             422147262
1  081234500867544944             422001974