0 1 2 3 4 5
word
</s> 0.001129 -0.000896 0.000319 0.001534 0.001106 -0.001404
in 0.070312 0.086914 0.087891 0.062500 0.069336 -0.108887
for -0.011780 -0.047363 0.044678 0.063477 -0.018188 -0.063965
that -0.015747 -0.028320 0.083496 0.050293 -0.110352 0.031738
is 0.007050 -0.073242 0.171875 0.022583 -0.132812 0.198242
我有这个DataFrame,我想获取其中索引(“单词”)包含特定字符串(大小写不变)的行。我尝试过
df[df.index.str.lower().contains("lebron") == True]
它给出KeyError: False
。虽然
df[df.index.str.contains("Lebron") == True]
工作正常。
在这种情况下如何使用lower()
?
答案 0 :(得分:3)
lower
在这里不是必需的,为不区分大小写的搜索在str.contains
中添加参数case=False
,如果可能缺少值则添加na=False
,如果没有则添加regex=False
正则表达式搜索:
df[df.index.str.contains("lebron", case=False)]
df[df.index.str.contains("lebron", case=False, na=False, regex=False)]
但是,如果只希望将str.lower()
与str.contains()
链接起来,就不必与True
进行比较:
df[df.index.str.lower().str.contains("lebron")]
df[df.index.str.lower().str.contains("lebron", na=False, regex=False)]
答案 1 :(得分:3)
如果您要进行简单的子字符串检查,建议不要使用正则表达式比较(即,不对正则表达式使用str.lower
)。您可以在此处使用列表理解:
df[['lebron' in x.lower() for x in df.index]]
如果索引中可能存在NaN,则可以修改解决方案以处理它们:
df[[pd.notna(x) and 'lebron' in x.lower() for x in df.index]]
没有正则表达式,可以相应地使用str.contains
:
df[df.index.str.lower().str.contains("lebron", regex=False)]
如果您没有NaN,可以省略结尾的== True
。否则,
df[df.index.str.lower().str.contains("lebron", regex=False) == True]
可以正常工作。