Question

我需要获取一个小的pandas值数据集，并遍历另一个数据集以查看它们是否匹配。如果匹配，则需要替换该值。

称为“不可接受的索引”的小熊猫数据集：

    Value   Make
0   1   Honda
1   2   Mazda
2   4   Holden
3   7   Toyota
4   9   Nissan
5   10  Ford

检查是否有上述任何构成在名为df的数据集中：

        Tried   Tested  Free    Cost    VehicleMake
0       False   False   False   40000.0 Kia
1       False   False   False   40000.0 Holden
2       False   False   False   40000.0 Kia
3       False   False   True    40000.0 Toyota
4       False   False   False   40000.0 Toyota
5       False   False   False   40000.0 VW

如果存在，那么我需要将VehicleMake更改为“ CombinedMakes”

因此在第二个数据帧中，索引1（保持），3（丰田），4（丰田）将更改为VehicleMake ='CombinedMakes'

        Tried   Tested  Free    Cost    VehicleMake
0       False   False   False   40000.0 Kia
1       False   False   False   40000.0 CombinedMakes
2       False   False   False   40000.0 Kia
3       False   False   True    40000.0 CombinedMakes
4       False   False   False   40000.0 CombinedMakes
5       False   False   False   40000.0 VW

我尝试了这个，但是它不起作用，而且速度也非常慢：

df['VehicleMake'] = df['VehicleMake'].replace(df.VehicleMake.isin(unacceptable_indexes.Make), "CombinedMakes")

任何建议将不胜感激！谢谢。

Answer 1

Ben Pap's answer几乎是正确的。应该是

df.loc[df['VehicleMake'].isin(unacceptable_indexes['Make']), 'VehicleMake'] = "CombinedMakes"

我将其分解：

1）unacceptable_indexes['Make']以熊猫Series的身分出现。

2）isin函数返回相关行的布尔值Series。

这使我们可以选择VehicleMake不可接受的行。（您可以尝试仅运行df['VehicleMake'].isin(unacceptable_indexes['Make'])来查看结果）

3）loc功能与df.loc[row/s, column/s]相同。因此，我们只需要表明我们是通过Make字符串而不是整个列访问'Make'列。

Answer 2

df.loc[df['VehicleMake'].isin(unacceptable_indexes['Make']), 'VehicleMake'] = "CombinedMakes"

这应该有效。您在左侧搜索所需内容，然后将其分配给方程式的右侧。

比较不同熊猫数据集中的2列，如果第二个数据集中存在值，则替换值

2 个答案: