我不确定是否还有其他相关问题。如果是的话,请让我知道......我已经搜索过了,但我找不到任何......
我想计算一下单词列表,如果某些单词不在单词前面三个或更少的单词。 以下是Count occurrences of a couple of specific words
中的示例我想数字,“foo”,“bar”,“baz”除了“no”之前的单词前面有三个或更少的单词。在这种情况下,一个酒吧和foo无法计算..
vocab = ["foo", "bar", "baz"]
exception= ["no"]
s = "foo bar baz no bar quux foo bla bla"
wordcount = dict((x,0) for x in vocab)
for w in re.findall(r"\w+", s):
if w in wordcount:
wordcount[w] += 1
请帮助我..提前非常感谢你..
答案 0 :(得分:2)
怎么样:
vocab = ["foo", "bar", "baz"]
exception= ["no"]
s = "foo bar baz no bar quux foo bla bla"
wordcount = dict((x,0) for x in vocab)
words = s.split()
i = 0
while i < len(words):
cur_word = words[i]
if cur_word in exception:
i += 4
else:
if cur_word in vocab: wordcount[cur_word] += 1
i += 1
print wordcount # {'baz': 1, 'foo': 1, 'bar': 1}
它只是利用了这样一个事实:如果我们遇到&#34; no&#34;,我们可以跳过以下3个元素。
答案 1 :(得分:1)
只需用空字符串替换no
以及以下三个单词,然后计算结果字符串中的单词。
>>> s = 'foo bar baz no bar quux foo bla bla'
>>> vocab = ["foo", "bar", "baz"]
>>> exception= ["no"]
>>> wordcount = dict((x,0) for x in vocab)
>>> m = re.sub(r'(?:^|\s)no(\s+\S+){0,3}', '', s)
>>> for w in re.findall(r"\w+", m):
if w in wordcount:
wordcount[w] += 1
>>> wordcount
{'foo': 1, 'bar': 1, 'baz': 1}
答案 2 :(得分:1)
你实际上可以使用Python的字符串执行此操作 - 无需正则表达式:
vocab = ["foo", "bar", "baz"]
ex_list= ["no"]
s = "foo bar baz no bar quux foo bla bla"
words=s.split()
wordcount = dict((x,0) for x in vocab)
for i, word in enumerate(words):
if i>=3 and any(w in ex_list for w in words[i-3:i]):
continue
elif word in vocab:
wordcount[word]+=1
由于切片不会生成索引错误,因此可以将循环简化为:
for i, word in enumerate(words):
if word in vocab and not any(w in ex_list for w in words[i-3:i]):
wordcount[word]+=1