Question

我不确定是否还有其他相关问题。如果是的话，请让我知道......我已经搜索过了，但我找不到任何......

我想计算一下单词列表，如果某些单词不在单词前面三个或更少的单词。以下是Count occurrences of a couple of specific words

中的示例

我想数字，“foo”，“bar”，“baz”除了“no”之前的单词前面有三个或更少的单词。在这种情况下，一个酒吧和foo无法计算..

vocab = ["foo", "bar", "baz"]
exception= ["no"]
s = "foo bar baz no bar quux foo bla bla"

wordcount = dict((x,0) for x in vocab)
for w in re.findall(r"\w+", s):
    if w in wordcount:
       wordcount[w] += 1

请帮助我..提前非常感谢你..

Answer 1

怎么样：

vocab = ["foo", "bar", "baz"]
exception= ["no"]
s = "foo bar baz no bar quux foo bla bla"

wordcount = dict((x,0) for x in vocab)

words = s.split()

i = 0
while i < len(words):
    cur_word = words[i]
    if cur_word in exception:
        i += 4
    else:
        if cur_word in vocab: wordcount[cur_word] += 1
        i += 1

print wordcount  # {'baz': 1, 'foo': 1, 'bar': 1}

它只是利用了这样一个事实：如果我们遇到＆＃34; no＆＃34;，我们可以跳过以下3个元素。

Answer 2

只需用空字符串替换no以及以下三个单词，然后计算结果字符串中的单词。

>>> s = 'foo bar baz no bar quux foo bla bla'
>>> vocab = ["foo", "bar", "baz"]
>>> exception= ["no"]
>>> wordcount = dict((x,0) for x in vocab)
>>> m = re.sub(r'(?:^|\s)no(\s+\S+){0,3}', '', s)
>>> for w in re.findall(r"\w+", m):
        if w in wordcount:
            wordcount[w] += 1


>>> wordcount
{'foo': 1, 'bar': 1, 'baz': 1}

Answer 3

你实际上可以使用Python的字符串执行此操作 - 无需正则表达式：

vocab = ["foo", "bar", "baz"]
ex_list= ["no"]
s = "foo bar baz no bar quux foo bla bla"

words=s.split()
wordcount = dict((x,0) for x in vocab)
for i, word in enumerate(words):
    if i>=3 and any(w in ex_list for w in words[i-3:i]):
        continue
    elif word in vocab:    
        wordcount[word]+=1

由于切片不会生成索引错误，因此可以将循环简化为：

for i, word in enumerate(words):
    if word in vocab and not any(w in ex_list for w in words[i-3:i]):
        wordcount[word]+=1

Python：计算单词列表，除非某些单词在前面

3 个答案: