所以我有这个嵌套的JSON:
{
"shipmentDate": "2019-10-14T14:00:35+02:00",
"shipmentId": 683160924,
"shipmentItems": [
{
"orderId": "2596035410",
"orderItemId": "BFC0000318171522"
}
],
"shipmentReference": "081234500868579440",
"transport": {
"transportId": 422147262
}
},
{
"shipmentDate": "2019-10-14T00:51:03+02:00",
"shipmentId": 683020323,
"shipmentItems": [
{
"orderId": "2595582210",
"orderItemId": "BFC0000318038054"
}
],
"shipmentReference": "081234500867544944",
"transport": {
"transportId": 422001974
}
}
我用它来获取数据框:
parsed_data = json.loads(r.text)
d = json_normalize(parsed_data['shipments'])
print(d.head())
我的输出:
shipmentId shipmentDate shipmentReference shipmentItems transport.transportId
0 689165626 2019-11-08T18:57:31+01:00 081234500924235822 [{'orderItemId': 'BFC0000331613400', 'orderId'... 428363308
1 689125502 2019-11-08T16:30:02+01:00 081234500923779723 [{'orderItemId': 'BFC0000331548600', 'orderId'... 428321764
2 689109783 2019-11-08T15:28:32+01:00 081234500923650213 [{'orderItemId': 'BFC0000331516105', 'orderId'... 428305148
3 689053625 2019-11-08T11:56:32+01:00 081234500923108493 [{'orderItemId': 'BFC0000331462628', 'orderId'... 428245727
4 689053493 2019-11-08T11:56:02+01:00 081234500923108813 [{'orderItemId': 'BFC0000331459706', 'orderId'... 428245587
但是,shippItems现在仍显示嵌套的JSON。像运输专栏一样,如何获得shipmentItems.orderId
和shipmentItems.orderItemId
的两列?
答案 0 :(得分:1)
您可以尝试一下,
data = [
{
"shipmentDate": "2019-10-14T14:00:35+02:00",
"shipmentId": 683160924,
"shipmentItems": [
{
"orderId": "2596035410",
"orderItemId": "BFC0000318171522"
}
],
"shipmentReference": "081234500868579440",
"transport": {
"transportId": 422147262
}
},
{
"shipmentDate": "2019-10-14T00:51:03+02:00",
"shipmentId": 683020323,
"shipmentItems": [
{
"orderId": "2595582210",
"orderItemId": "BFC0000318038054"
}
],
"shipmentReference": "081234500867544944",
"transport": {
"transportId": 422001974
}
}
]
from pandas.io.json import json_normalize
columns = [
'shipmentDate',
'shipmentId',
'shipmentReference',
['transport', 'transportId']
]
df = json_normalize(data, 'shipmentItems', columns)
这是结果,
>>> data
[{'shipmentDate': '2019-10-14T14:00:35+02:00', 'shipmentId': 683160924, 'shipmentItems': [{'orderId': '2596035410', 'orderItemId': 'BFC0000318171522'}], 'shipmentReference': '081234500868579440', 'transport': {'transportId': 422147262}}, {'shipmentDate': '2019-10-14T00:51:03+02:00', 'shipmentId': 683020323, 'shipmentItems': [{'orderId': '2595582210', 'orderItemId': 'BFC0000318038054'}], 'shipmentReference': '081234500867544944', 'transport': {'transportId': 422001974}}]
>>> columns = [
'shipmentDate',
'shipmentId',
'shipmentReference',
['transport', 'transportId']
]... ... ... ... ...
>>> json_normalize(data, 'shipmentItems', columns)
orderId orderItemId shipmentDate shipmentId \
0 2596035410 BFC0000318171522 2019-10-14T14:00:35+02:00 683160924
1 2595582210 BFC0000318038054 2019-10-14T00:51:03+02:00 683020323
shipmentReference transport.transportId
0 081234500868579440 422147262
1 081234500867544944 422001974