Question

我试图在python中创建一个正则表达式，允许我在字符串中找到一个单词“n”次

例如，如果我想找到一个可以匹配的表达式，如果单词“cat”恰好是两次。我该怎么做？

应该接受“蓝猫与树上的红猫对话”。因为它有两次“猫”。

但它不应该接受“猫很大”。因为它只有“猫”一次

它不应该接受“狗是黄色的”。出于类似的原因

非常感谢

EDIT 嘿伙计们

很抱歉让问题太复杂，但我忘了提一件事。

如果我想两次找到“猫”，“catcat run”也会匹配

Answer 1

不要因为它们在那里而使用正则表达式。

words = text.split()
print words.count('cat')

正如文森特所指出的那样，假设所有单词都被空格分开。

words = re.findall("\b\w*")

可能是更好的选择。虽然这是否必要，但取决于您帖子中未提供的详细信息。

修改

如果你甚至不关心单词边界，那么使用正则表达式的理由就更少了。

print text.count("cat")

Answer 2

findall + len似乎是一种解决方案。

Answer 3

这个怎么样：

re.match(r'(.*\bcat\b){2}', 'The blue cat talks to the red cat in the tree')

{2}表示“重复2次”。使用{7}重复7次。 \b是一个单词边界;在这种情况下，“蓝猫会谈”中的猫会匹配，但“验证”不会。 .*将匹配任何字符串。

您可能想要查看the re documentation。

Answer 4

只需构建一个正则表达式，其中“cat”的多个实例由一个消耗其他字符的组分隔开来：

>>> import re
>>> n = 2
>>> regex = re.compile('.*'.join(['\bcat\b'] * n))
>>> regex.search('The cat is big')
>>> regex.search('The blue cat talks to the red cat in the tree')
<_sre.SRE_Match object at 0x17ca1a8>

Answer 5

如果您希望使用单个正则表达式来确保字符串恰好包含单词“cat”的2个实例（不多，不少，而不是“灾难性”或“catcat”），那么以下测试脚本会做的伎俩：

import re
text = r'The cat chased its cat toy, but failed to catch it.'
if re.match(r"""
    # Match string containing exactly n=2 "cat" words.
    ^                    # Anchor to start of string.
    (?:                  # Group for specific word count.
      (?:(?!\bcat\b).)*  # Zero or more non-"cat" chars,
      \bcat\b            # followed by the word "cat",
    ){2}                 # exactly n=2 times.
    (?:(?!\bcat\b).)*    # Zero or more non-"cat" chars.
    \Z                   # Anchor to end of string.
    """, text, re.DOTALL | re.VERBOSE):
    # Match attempt successful.
    print "Match found"
else:
    # Match attempt failed.
    print "No match found"

但是，如果你做希望匹配“catastrophic”和“catcat”中的cat，那么从正则表达式中删除所有\b字边界锚。

用于在字符串中查找单词的正则表达式“n”次

5 个答案: