Question

我有一个像下面这样的pandas数据框，列名为'texts'

texts
throne one
bar one
foo two
bar three
foo two
bar two
foo one
foo three
one three

我想计算每一行的三个单词'one'，'two'和'three'的存在，并返回这些单词的匹配计数，如果它是一个完整的单词。输出如下所示。

    texts   counts
    throne one  1
    bar one     1
    foo two     1
    bar three   1
    foo two     1
    bar two     1
    foo one     1
    foo three   1
    one three   2

你可以看到，与第一行相比，count是1，因为'throne'不被视为被搜索的值之一'one'不是一个完整的单词而是它是'宝座'。

对此有何帮助？

Answer 1

将pd.Series.str.count加入words

，将'|'与正则表达式结合使用

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(words)))

        texts  counts
0  throne one       2
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2

为了确定'throne'，就像不计算它一样，我们可以为正则表达式添加一些单词边界

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(map(r'\b{}\b'.format, words))))

        texts  counts
0  throne one       1
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2

对于天赋，在Python 3.6中使用原始形式的f-string

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(fr'\b{w}\b' for w in words)))

        texts  counts
0  throne one       1
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2

返回pandas列中存在的多个单词的计数

1 个答案: