我想在PHP中创建一个正则表达式,用于搜索包含“this”或“that”的文本中的句子至少两次(所以至少两次“this”或至少两次“that”)
我们陷入了困境:
EJB
答案 0 :(得分:3)
使用此模式(\b(?:this|that)\b).*?\1
Demo
( # Capturing Group (1)
\b # <word boundary>
(?: # Non Capturing Group
this # "this"
| # OR
that # "that"
) # End of Non Capturing Group
\b # <word boundary>
) # End of Capturing Group (1)
. # Any character except line break
*? # (zero or more)(lazy)
\1 # Back reference to group (1)
答案 1 :(得分:0)
这主要是Wiktor的模式,偏离隔离句子并省略全字符串匹配中的前导空白字符。
模式:/\b[^.?!]*\b(th(?:is|at))\b[^.?!]*(\b\1\b)[^.?!]*\b[.!?]/i
这是一个示例文本,将演示其他答案如何不正确地取消对#34;字边界&#34;或&#34;不区分大小写&#34;原因:(Demo - 捕获组应用于演示中的\b\1\b
,以显示哪些子字符串符合匹配的句子)
This is nothing.
That is what that will be.
The Indian policeman hit the thief with his lathis before pushing him into the thistles.
This Indian policeman hit the thief with this lathis before pushing him into the thistles. This is that and that.
The Indian policeman hit the thief with this lathis before pushing him into the thistles.
要查看模式的官方细分,请参阅演示链接。
简单来说:
/ #start of pattern
\b #match start of a sentence on a "word character"
[^.?!]* #match zero or more characters not a dot, question mark, or exclamation
\b(th(?:is|at))\b #match whole word "this" or "that" (not thistle)
[^.?!]* #match zero or more characters not a dot, question mark, or exclamation
\b\1\b #match the earlier captured whole word "this" or "that"
[^.?!]* #match zero or more characters not a dot, question mark, or exclamation
\b #match second last character of sentence as "word character"
[.!?] #match the end of a sentence: dot, question mark, exclamation
/ #end of pattern
i #make pattern case-insensitive
该模式将匹配上述示例文本中的五个句子中的三个:
That this is what that will be.
This Indian policeman hit the thief with this lathis before pushing him into the thistles.
This is that and that.
*注意,之前我在模式开头使用\s*\K
来省略空白字符。我选择改变我的模式以使用额外的单词边界元字符来提高效率。如果这不适用于您的项目文字,最好还原到我的original pattern。
答案 2 :(得分:-1)
使用此
.*(this|that).*(this|that).*
更新:
这是另一种方法,基于你的正则表达式:
.*(this\s?|that\s?){2,}.*[\.\n]*