Question

给出以下文字：

Lorem ipsum dolor 
sit amet, consectetur 
adipiscing elit.

Phasellus id 
tristique est.

Mauris eget massa leo.
Pellentesque egestas 
ante vitae finibus luctus. 

Nam tristique metus 
nec semper semper.

是否可以通过正则表达式匹配包含字符串tristique的2个块？

所以这些将是2场比赛：

Phasellus id 
tristique est.

Nam tristique metus 
nec semper semper.

Answer 1

您可以尝试下面的正则表达式。

(?s)\b(?:(?!\n\n).)*?\btristique\b(?:(?!\n\n).)*

DEMO

(?:(?!\n\n).)*匹配任何字符，但不匹配\n\n，零次或多次。

Answer 2

合理的方法是按字段（\n\n+）拆分字符串，然后找到带有＆＃34; tristique＆＃34;的段落。这可能是最快的方式。

Javascript示例：

var result = text.split(/^\n+|\n\n+/).filter(function (elt) {
    return /\btristique\b/.test(elt);
});

要一次完成相同的任务，并防止大量回溯，您需要使用javascript中无法使用的高级正则表达式功能。 PHP的一个例子：

$pattern = <<<'EOD'
~^
# non-empty lines without the target word
(?:
    (?=\N) #check if there is one character
    # content without the target word
    [^t\n]*+ #all until a "t" or a newline
    (?: \Bt+[^t\n]* | t+(?!ristique\b)[^t\n]* )*+ #when a "t" is met
    \n #a newline
)*+

# characters until the target word
[^t\n]*+
(?: \Bt+[^t\n]* | t+(?!ristique\b)[^t\n]* )*+
(*SKIP) # if the target word doesn't follow then skip the substring

tristique  # target: note that word boundaries are implicit
\N*        # trailing characters
(?:\n\N+)* # following non empty lines
~mx
EOD;

if (preg_match_all($pattern, $text, $matches)) {
// do what you have to do
}

如何通过正则表达式选择包含字符串的文本块？

2 个答案: