从熊猫数据框中的json数组中提取数据

时间:2020-01-25 10:33:42

标签: python json pandas

从电影数据库api获取的json看起来像这样:

{
  "id": 550,
  "cast": [
    {
      "cast_id": 4,
      "character": "The Narrator",
      "credit_id": "52fe4250c3a36847f80149f3",
      "gender": 2,
      "id": 819,
      "name": "Edward Norton",
      "order": 0,
      "profile_path": "/eIkFHNlfretLS1spAcIoihKUS62.jpg"
    }
],
  "crew": [
    {
      "credit_id": "56380f0cc3a3681b5c0200be",
      "department": "Writing",
      "gender": 0,
      "id": 7469,
      "job": "Screenplay",
      "name": "Jim Uhls",
      "profile_path": null
    },
{
      "credit_id": "57fe1e549251410699007177",
      "department": "Costume & Make-Up",
      "gender": 1,
      "id": 1693424,
      "job": "Assistant Costume Designer",
      "name": "Mirela Rupic",
      "profile_path": "/5z0I2eRwBrJjSv27ig4VnU0lmCZ.jpg"
    }
    ]
    }

因此,该对象具有字段id int,cast []和crew []。 我需要从同一大熊猫数据框中的每个“父” ID:id,crew_id,job,name的船员[]中提取数据。到目前为止,我正在使用df=json.normalize(crew)crew获取所有内容。我如何从数组中获取数据?我的数据框如下所示:

id     crew_id     job                       name
550     7469      Screenplay                 Jim Uhls
550    1693424  Assistant Costume Designer  Mirela Rupic
551   someid.    somejob.                   some name
etc.   etc.        etc                       etc

1 个答案:

答案 0 :(得分:1)

我使用列表理解来获取详细信息;由于主要ID是常量,因此唯一需要迭代的部分是字典中“乘员”部分的详细信息。

 M = [(d['id'],
      i['id'],
      i['job'],
      i['name'])
      for i in d['crew']]

df = pd.DataFrame(M, columns=['id','crew_id','job','name'])

    id  crew_id      job                     name
0   550 7469    Screenplay                  Jim Uhls
1   550 1693424 Assistant Costume Designer  Mirela Rupic