Question

这是此堆栈溢出问题的后续内容

Select by partial string from a pandas DataFrame

根据部分字符串返回行。

df[df['A'].str.contains("hello")]

我的问题是，如何返回包含部分字符串的多个实例的行。

例如，如果我想返回其中特定列包含部分字符串'ology'的3个实例的所有行，该怎么办？我该怎么办？

示例：

testdf = pd.DataFrame([['test1', 'this is biology mixed with zoology', ], ['test2', 'the cat and bat teamed up to find some food'], ['test2' , 'anthropology with pharmacology and biology']])

testdf.head()


>0  1
>0  test1   this is biology mixed with zoology
>1  test2   the cat and bat teamed up to find some food
>2  test2   anthropology with pharmacology and biology

testdf = testdf[testdf[1].str.contains("ology")]
testdf.head()

>0  1
>0  test1   this is biology mixed with zoology
>2  test2   anthropology with pharmacology and biology

我要查找的是具有3个“ ology”实例的行，因此它只会返回最后一行

>2  test2   anthropology with pharmacology and biology

Answer 1

在这种情况下，您不想使用str.contains而是使用str.count来查找ology的出现次数：

testdf[testdf['Col2'].str.count('ology').eq(3)]

输出：

    Col1                                        Col2
2  test2  anthropology with pharmacology and biology

注意，我将您的列称为Col1和Col2

Answer 2

要使用str.contains，可以如下使用pat：

testdf[1].str.contains('(.*ology.*){3}')

Out[29]:
0    False
1    False
2     True
Name: 1, dtype: bool

在Pandas中，如何返回包含部分字符串的多个实例的行

2 个答案: