Question

使用python：如果强积极前瞻至少匹配一次，我如何让正则表达式继续。

我正在尝试匹配：

Clinton-Orfalea-Brittingham Fellowship Program

这是我现在使用的代码：

dp2= r'[A-Z][a-z]+(?:-\w+|\s[A-Z][a-z]+)+' print np.unique(re.findall(dp2, tt))

我匹配这个词，但它也匹配了一堆其他无关的词。我的想法是，我希望只有\s[A-Z][a-z]才能启动 - \ w +已被击中至少一次（或者可能两次）。会很感激任何想法。

澄清一下：我的目的并不是要与这组单词相匹配，而是能够通常匹配正确的名词 - 正确的名词 - （无限次数），然后是非连字的正确名词。

例如。 Noun-Noun-Noun Noun Noun Noun

Noun-Noun Noun

Noun-Noun-Noun Noun

最新的迭代：

dp5 = r'（？：[A-Z] [a-z] + - ？）{2,3}（？：\ s \ w +）{2,4}'

Answer 1

如果{m,n}和m次之间存在前一个表达式，则n表示法可用于强制正则表达式仅匹配。也许像是

(?:[A-Z][a-z]+-?){2,3}\s\w+\s\w+ # matches 'Clinton-Orfalea-Brittingham Fellowship Program'

如果您特意寻找"Clinton-Orfalea-Brittingham Fellowship Program"，为什么要使用正则表达式来查找它？只需使用word in string即可。如果您正在寻找表格中的内容：Name-Name-Name Noun Noun，这应该有用，但要注意Name-Name-Name-Name Noun Noun不会，也不会Name-Name-Name Noun Noun Noun（事实上，像{{1}不仅会匹配它，而且会匹配它后面的任何单词！）

"Alice-Bob-Catherine Program"

如果您正在寻找专用名词，后跟非连字专有名词，我会这样做：

# Explanation

RE = r"""(?:        # Begins the group so we can repeat it
        [A-Z][a-z]+ # Matches one cap letter then any number of lowercase
        -?          # Allows a hyphen at the end of the word w/o requiring it
    ){2,3}          # Ends the group and requires the group match 2 or 3 times in a row
    \s\w+           # Matches a space and the next word
    \s\w+           # Does so again
    # those last two lines could just as easily be (?:\s\w+){2}
"""
RE = re.compile(RE,re.verbose) # will compile the expression as written

只有在积极前瞻至少匹配一次时，正则表达式才会继续

1 个答案: