返回pandas列中存在的多个单词的计数

时间:2018-04-05 15:52:24

标签: python string pandas

我有一个像下面这样的pandas数据框,列名为'texts'

texts
throne one
bar one
foo two
bar three
foo two
bar two
foo one
foo three
one three

我想计算每一行的三个单词'one','two'和'three'的存在,并返回这些单词的匹配计数,如果它是一个完整的单词。输出如下所示。

    texts   counts
    throne one  1
    bar one     1
    foo two     1
    bar three   1
    foo two     1
    bar two     1
    foo one     1
    foo three   1
    one three   2

你可以看到,与第一行相比,count是1,因为'throne'不被视为被搜索的值之一'one'不是一个完整的单词而是它是'宝座'。

对此有何帮助?

1 个答案:

答案 0 :(得分:7)

pd.Series.str.count加入words

,将'|'与正则表达式结合使用
words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(words)))

        texts  counts
0  throne one       2
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2

为了确定'throne',就像不计算它一样,我们可以为正则表达式添加一些单词边界

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(map(r'\b{}\b'.format, words))))

        texts  counts
0  throne one       1
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2

对于天赋,在Python 3.6中使用原始形式的f-string

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(fr'\b{w}\b' for w in words)))

        texts  counts
0  throne one       1
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2