我目前正在尝试使用以下格式的JSON文件:
response = {
"leads": [{
"id": 208827181,
"campaignId": 2595,
"contactId": 2919361,
"contactAttempts": 1,
"contactAttemptsInvalid": 0,
"lastModifiedTime": "2017-03-14T13:37:20Z",
"nextContactTime": "2017-03-15T14:37:20Z",
"created": "2017-03-14T13:16:42Z",
"updated": "2017-03-14T13:37:20Z",
"lastContactedBy": 1271,
"status": "automaticRedial",
"active": True,
"masterData": [{
"id": 2054,
"label": "Firmanavn",
"value": "Firma_1"
},
{
"id": 2055,
"label": "Adresse",
"value": "Gadenavn_1"
},
{
"id": 2056,
"label": "Postnr.",
"value": "2000"
},
{
"id": 2057,
"label": "Bydel",
"value": "Frederiksberg"
},
{
"id": 2058,
"label": "Telefonnummer",
"value": "25252525"
}
]
}]
}
masterData为嵌套列表格式,但长度也有所不同。基本上,每个行/条目都可以分配有不同的列。我正在为每个条目保留一个或多个特定列。但是,由于嵌套列表的长度不同,使用当前的索引时,索引会中断。 这是我的代码:
leads = json_normalize(response['leads'])
df = pd.concat([leads.drop('masterData', 1),
pd.DataFrame(list(pd.DataFrame(list(leads['masterData']))[4]))
.drop(['id', 'label'], 1)
.rename(columns={"value": "tlf"})], axis=1)
所需的输出是:
active campaignId contactAttempts contactAttemptsInvalid contactId created id lastContactedBy lastModifiedTime nextContactTime resultData status updated tlf
0 True 2595 1 0 2919361 2017-03-14T13:16:42Z 208827181 1271.0 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z [] automaticRedial 2017-03-14T13:37:20Z 37373737
1 True 2595 2 0 2919359 2017-03-14T13:16:42Z 208827179 1271.0 2017-03-14T13:33:30Z 2017-03-15T14:33:30Z [] privateRedial 2017-03-14T13:33:30Z 55555555
2 True 2595 1 0 2919360 2017-03-14T13:16:42Z 208827180 1271.0 2017-03-14T13:36:06Z None [] success 2017-03-14T13:36:06Z 22222222
3 True 2595 1 0 2919362 2017-03-14T13:16:42Z 208827182 1271.0 2017-03-14T13:56:39Z None [] success 2017-03-14T13:56:39Z 34343434
其中“ tlf”是“ masterData”中添加的列。
答案 0 :(得分:1)
仅使用json_normalize
并在列表中指定列名称:
L = ['active', 'campaignId', 'contactAttempts', 'contactAttemptsInvalid',
'contactId', 'created', 'id', 'lastContactedBy', 'lastModifiedTime',
'nextContactTime', 'status', 'updated']
df = json_normalize(response['leads'], 'masterData', L, record_prefix='masterData.')
print (df)
masterData.id masterData.label masterData.value active campaignId \
0 2054 Firmanavn Firma_1 True 2595
1 2055 Adresse Gadenavn_1 True 2595
2 2056 Postnr. 2000 True 2595
3 2057 Bydel Frederiksberg True 2595
4 2058 Telefonnummer 25252525 True 2595
contactAttempts contactAttemptsInvalid contactId created \
0 1 0 2919361 2017-03-14T13:16:42Z
1 1 0 2919361 2017-03-14T13:16:42Z
2 1 0 2919361 2017-03-14T13:16:42Z
3 1 0 2919361 2017-03-14T13:16:42Z
4 1 0 2919361 2017-03-14T13:16:42Z
id lastContactedBy lastModifiedTime nextContactTime \
0 208827181 1271 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z
1 208827181 1271 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z
2 208827181 1271 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z
3 208827181 1271 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z
4 208827181 1271 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z
status updated
0 automaticRedial 2017-03-14T13:37:20Z
1 automaticRedial 2017-03-14T13:37:20Z
2 automaticRedial 2017-03-14T13:37:20Z
3 automaticRedial 2017-03-14T13:37:20Z
4 automaticRedial 2017-03-14T13:37:20Z