假设我有一个像这样的pandas数据框:
Word Ratings
0 TLYSFFPK 1
1 SVLENFVGR 2
2 SVFNHAIRK 3
3 KAGEVFIHK 4
如何在pandas中使用正则表达式来过滤掉具有与以下正则表达式模式匹配的单词但保留数据框格式的行?正则表达式模式是:\ b。[VIFY] [MLFYIA] \ w + [LIYVF]。[KR] \ b
预期产出:
Word Ratings
1 SVLENFVGR 2
2 SVFNHAIRK 3
答案 0 :(得分:2)
演示:
In [2]: df
Out[2]:
Word Ratings
0 TLYSFFPK 1
1 SVLENFVGR 2
2 SVFNHAIRH 3
3 KAGEVFIHK 4
In [3]: pat = r'\b.[VIFY][MLFYIA]\w+[LIYVF].[KR]\b'
In [4]: df.Word.str.contains(pat)
Out[4]:
0 False
1 True
2 False
3 False
Name: Word, dtype: bool
In [5]: df[df.Word.str.contains(pat)]
Out[5]:
Word Ratings
1 SVLENFVGR 2