从电影数据库api获取的json看起来像这样:
{
"id": 550,
"cast": [
{
"cast_id": 4,
"character": "The Narrator",
"credit_id": "52fe4250c3a36847f80149f3",
"gender": 2,
"id": 819,
"name": "Edward Norton",
"order": 0,
"profile_path": "/eIkFHNlfretLS1spAcIoihKUS62.jpg"
}
],
"crew": [
{
"credit_id": "56380f0cc3a3681b5c0200be",
"department": "Writing",
"gender": 0,
"id": 7469,
"job": "Screenplay",
"name": "Jim Uhls",
"profile_path": null
},
{
"credit_id": "57fe1e549251410699007177",
"department": "Costume & Make-Up",
"gender": 1,
"id": 1693424,
"job": "Assistant Costume Designer",
"name": "Mirela Rupic",
"profile_path": "/5z0I2eRwBrJjSv27ig4VnU0lmCZ.jpg"
}
]
}
因此,该对象具有字段id int,cast []和crew []。
我需要从同一大熊猫数据框中的每个“父” ID:id,crew_id,job,name
的船员[]中提取数据。到目前为止,我正在使用df=json.normalize(crew)
从crew
获取所有内容。我如何从数组中获取数据?我的数据框如下所示:
id crew_id job name
550 7469 Screenplay Jim Uhls
550 1693424 Assistant Costume Designer Mirela Rupic
551 someid. somejob. some name
etc. etc. etc etc
答案 0 :(得分:1)
我使用列表理解来获取详细信息;由于主要ID是常量,因此唯一需要迭代的部分是字典中“乘员”部分的详细信息。
M = [(d['id'],
i['id'],
i['job'],
i['name'])
for i in d['crew']]
df = pd.DataFrame(M, columns=['id','crew_id','job','name'])
id crew_id job name
0 550 7469 Screenplay Jim Uhls
1 550 1693424 Assistant Costume Designer Mirela Rupic