编辑

Question

这是来自R家伙。

我在Pandas列中有这个烂摊子：tlp_aff_id: '<?php echo urldecode($params['aff_id']) ?>'。

data['crew']

它持续了一段时间。每个新字典都以array(["[{'credit_id': '54d5356ec3a3683ba0000039', 'department': 'Production', 'gender': 1, 'id': 494, 'job': 'Casting', 'name': 'Terri Taylor', 'profile_path': None}, {'credit_id': '56407fa89251417055000b58', 'department': 'Sound', 'gender': 0, 'id': 6745, 'job': 'Music Editor', 'name': 'Richard Henderson', 'profile_path': None}, {'credit_id': '5789212392514135d60025fd', 'department': 'Production', 'gender': 2, 'id': 9250, 'job': 'Executive In Charge Of Production', 'name': 'Jeffrey Stott', 'profile_path': None}, {'credit_id': '57892074c3a36835fa002886', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 23783, 'job': 'Makeup Artist', 'name': 'Heather Plott', 'profile_path': None}字段开头。一个卖出可以将多个字典排列成阵列。

假设我想要所有credit_id导演的姓名，如第一项所示。我需要检查每个字典中的Casting条目，如果它是job，请抓住Casting字段中的内容，并将其存储在name中的数据框中。 / p>

我尝试了几种策略，然后放弃尝试简单的方法。运行以下命令关闭了我，因此我什至无法访问一个简单的字段。我如何在Pandas中完成此操作。

data['crew']

编辑：错误消息

for row in data.head().iterrows():
    if row['crew'].job == 'Casting':
        print(row['crew'])

编辑：首先用于获取dict（字符串？）数组的代码。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-138-aa6183fdf7ac> in <module>()
      1 for row in data.head().iterrows():
----> 2     if row['crew'].job == 'Casting':
      3         print(row['crew'])

TypeError: tuple indices must be integers or slices, not str

Answer 1

要从示例数据创建DataFrame，请输入：

df = pd.DataFrame(data=[
  { 'credit_id': '54d5356ec3a3683ba0000039', 'department': 'Production',
    'gender': 1, 'id': 494, 'job': 'Casting', 'name': 'Terri Taylor',
    'profile_path': None},
  { 'credit_id': '56407fa89251417055000b58', 'department': 'Sound',
    'gender': 0, 'id': 6745, 'job': 'Music Editor',
    'name': 'Richard Henderson', 'profile_path': None},
  { 'credit_id': '5789212392514135d60025fd', 'department': 'Production',
    'gender': 2, 'id': 9250, 'job': 'Executive In Charge Of Production',
    'name': 'Jeffrey Stott', 'profile_path': None},
  { 'credit_id': '57892074c3a36835fa002886', 'department': 'Costume & Make-Up',
    'gender': 0, 'id': 23783, 'job': 'Makeup Artist',
    'name': 'Heather Plott', 'profile_path': None}])

然后，您可以通过一条指令获取数据：

df[df.job == 'Casting'].name

结果是：

0    Terri Taylor
Name: name, dtype: object

上面的结果是找到名称的 Pandas Series 对象。在这种情况下，0是找到的记录的索引值， Terri Taylor是Casting Director（在您的数据中唯一）的名称。

编辑

如果只需要一个列表（而不是系列），请输入：

df[df.job == 'Casting'].name.tolist()

结果是['Terri Taylor']-只是一个列表。

我认为，我的两种解决方案都应该比“常规”循环更快基于iterrows()。

检查执行时间，您还可以尝试另一种解决方案：

df.query("job == 'Casting'").name.tolist()

==========

就您的代码而言：

iterrows()每次返回包含以下内容的对：

当前行的键，
一个命名的元组-此行的内容。

因此您的循环应类似于：

for row in df.iterrows():
    if row[1].job == 'Casting':
        print(row[1]['name'])

您不能写row[1].name，因为它引用了 index 值（此处与命名元组的默认属性发生冲突）。

根据同一个字典中的另一个，提取熊猫中一个字典键的值

1 个答案:

编辑