计算pandas中各列的字符串外观

时间:2017-10-23 15:57:44

标签: python regex pandas dataframe contains

考虑以下数据框:

import pandas as pd
df = pd.DataFrame(["What is the answer", 
                   "the answer isn't here, but the answer is 42" , 
                   "dogs are nice", 
                   "How are you"], columns=['words'])
df
                                         words
0                           What is the answer
1  the answer isn't here, but the answer is 42
2                                dogs are nice
3                                  How are you

我想计算某个字符串的出现次数,可能会在每个索引中重复几次。

例如,我想计算the answer出现的次数。 我试过了:

df.words.str.contains(r'the answer').count()

我希望找到解决方案,但输出为4。 我不明白为什么。 the answer出现了3次。

What is **the answer**
**the answer** isn't here, but **the answer** is 42

注意:搜索字符串可能会在行中出现多次

1 个答案:

答案 0 :(得分:4)

您需要str.count

In [5285]: df.words.str.count("the answer").sum()
Out[5285]: 3

In [5286]: df.words.str.count("the answer")
Out[5286]:
0    1
1    2
2    0
3    0
Name: words, dtype: int64