Question

这是我的文字：

"A popular resource for the Christian community in the Asheville area."
"I love the acting community in the Orange County area."

我想捕获"Asheville"和"Orange County"。如何从最近的"the"到"area"开始捕捉？

这是我的正则表达式：

/the (.+?) area/

他们捕获：

"Christian community in the Asheville"
"acting community in the Orange County"

Answer 1

使用(?:(?!the).)+? tempered greedy token：

/the ((?:(?!the).)+?) area/

请参阅regex demo。它与/the ([^t]*(?:t(?!he)[^t]*)*?) area/几乎相同，但是the latter is a bit more efficient，因为它是展开的模式。

(?:(?!the).)+?匹配任何未启动the字符序列的1个字符（尽可能少）。

为了使其更安全，请添加单词边界以仅匹配整个单词：

/\bthe ((?:(?!\bthe\b).)+?) area\b/

Ruby demo：

s = 'I love the acting community in the Orange County area.'
puts s[/the ((?:(?!the).)+?) area/,1]
# => Orange County

注意：如果您希望匹配跨越多行，请不要忘记添加/m修饰符：

/the ((?:(?!the).)+?) area/m
                           ^

Answer 2

使用驯化贪婪的解决方案，以便匹配的文字不包含另一个the。这样，它始终与最后the

匹配

/the (?:(?!the).)+? area/

(?:(?!the).)+?代表一个驯化的贪婪点，它匹配除包含文本the的字符之外的任何字符。这是使用否定前瞻(?!the)提到的，它告诉它与文本the不匹配。因此，它确保匹配永远不会包含文本the
通过使用捕获组只是在the和area之间提取文本等，可以进一步增强此功能。另一种方法是使the和area成为后视和前瞻，但会比捕获组慢一点。

Answer 3

(?<=in the)(.*)(?=area)

（？＆lt; =）：看看后面的命令（？=）：向前看命令，这将排除你在=符号后输入的字符串。在这种情况下，＆＃39;在＆＃39;和＆＃39; area＆＃39;将被排除在结果之外。

（。）在这里使用的是“贪婪”，但您可以使用（。？）匹配前瞻命令中键入的下一个单词。