最好的方法是计算列表和python中的字符串之间的匹配数

时间:2015-12-25 15:58:40

标签: python regex string list python-2.7

在python中计算列表和字符串之间匹配数的最佳方法是什么?

例如,如果我有这个列表:

list = ['one', 'two', 'three']

和这个字符串:

line = "some one long. two phrase three and one again"

我想得到4因为我有

one 2 times
two 1 time
three 1 time

我根据this question答案尝试下面的代码并且它有效但如果我在列表中添加许多单词(4000个单词),我会收到错误:

import re
word_list = ['one', 'two', 'three']
line = "some one long. two phrase three and one again"
words_re = re.compile("|".join(word_list))
print(len(words_re.findall(line)))

这是我的错误:

words_re = re.compile("|".join(word_list))
  File "/usr/lib/python2.7/re.py", line 190, in compile

1 个答案:

答案 0 :(得分:1)

如果您想要不区分大小写并且匹配忽略标点符号的整个单词,请拆分字符串并使用dict去除标点符号以存储您想要计算的单词:

lst = ['one', 'two', 'three']
from string import punctuation
cn = dict.fromkeys(lst, 0)
line = "some one long. two phrase three and one again"

for word in line.lower().split():
    word = word.strip(punctuation)
    if word in cn:
        cn[word] += 1


print(cn)

{'three': 1, 'two': 1, 'one': 2}

如果您只想使用具有相同逻辑的 set

from string import punctuation

st = {'one', 'two', 'three'}
line = "some one long. two phrase three and one again"

print(sum(word.strip(punctuation) in st for word in line.lower().split()))

这会在分割后对单词进行单次传递,设置查找为0(1),因此效率明显高于list.count