Question

我尝试从数据框df中选择与键列表IDs匹配的行。

df.loc[IDs]

在极少数情况下，它们的键不包含在DataFrame中，并抛出KeyError：

KeyError: "None of [['001U000001c6OczIAE' '001U000000fgVR9IAM' '0015800000ecNcjAAE'\n '001U000000fgVRDIA2']] are in the [index]"

我可以轻松访问该异常中缺失密钥的列表吗？通过异常处理不这样做会更好/更清洁/更pythonic吗？

Answer 1

如果没有更多信息，您可能已经忘记了一步 - 您是否已将索引设置为Salesforce ID（假设它是什么）？

例如（随机数据）：

df
    a account
0   1     abc
1   3     abc
2   5     abc
3   7     def
4   7     def
5  34      gf
6   3      hj
7  24      hj

lis = ['abc', 'hj']
df.loc[lis]
KeyError: "None of [['abc', 'hj']] are in the [index]"

适用于0.21.0之前的pandas版本

设置索引后：

df.set_index('account').loc[lis]
          a
account    
abc       1
abc       3
abc       5
hj        3
hj       24

缺少值不应该抛出错误，而是np.nan值：

lis = ['abc', 'hj', 'j']
df.set_index('account').loc[lis]
            a
account      
abc       1.0
abc       3.0
abc       5.0
hj        3.0
hj       24.0
j         NaN

适用于pandas版本0.21.0 +

您需要使用数据框方法reindex()。但是，使用reindex，您将无法在索引中包含重复项（因此，如果没有重复数据删除，上面的示例将无效）：

df.set_index('account')\
.groupby(level=0).first()\  # de-duplicate index here
.reindex(lis2)

           a
account     
abc      1.0
hj       3.0
j        NaN

不幸的是，如果您尝试继续使用数据帧，则在重复数据删除后使用reindex()会导致数据丢失。无论如何，这可能不是查找数据框中缺少标识符的最佳方法。

pandas DataFrame KeyError：获取缺失键的列表

1 个答案: