Question

关键字是“ * OR”或“ * AND”。

假设我有以下字符串：

这是带有！＃等特殊字符的t3xt。 * AND这是另一个带有特殊字符的文本*并且此重复*或不重复*或具有更多字符串*并以此字符串结尾。

我想要以下

group1 "This is a t3xt with special characters like !#."  
group2 "*AND"  
group3 "and this is another text with special characters"  
group4 "*AND"  
group5 "this repeats"  
group6 "*OR"  
group7 "do not repeat"  
group8 "*OR"  
group9 "have more strings"  
group10 "*AND"  
group11 "finish with this string."

我曾经这样尝试过：

(.+?)(\*AND\*OR)

但是它只获取第一个字符串，然后我需要继续重复代码以收集其他字符串，但是问题是有些字符串只有一个* AND，或者只有一个* OR或数十个字符串，即相当随机。而且下面的正则表达式也不起作用：

((.+?)(\*AND\*OR))+

例如：

这是带有！＃等特殊字符的t3xt。 * AND这是另一个带有特殊字符的文字

Answer 1

PHP对于此类事情有一个preg_split函数。 preg_split允许您使用分隔符来分割字符串，分隔符可以定义为正则表达式模式。此外，它还有一个参数，允许您在匹配/拆分结果中包括匹配的定界符。

因此，正则表达式用于分隔符本身，而不是编写用于匹配全文的正则表达式。

示例：

$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
$string = preg_split('~(\*(?:AND|OR))~',$string,0,PREG_SPLIT_DELIM_CAPTURE);
print_r($string);

输出：

Array
(
    [0] => This is a t3xt with special characters like !#. 
    [1] => *AND
    [2] =>  and this is another text with special characters 
    [3] => *AND
    [4] =>  this repeats 
    [5] => *OR
    [6] =>  do not repeat 
    [7] => *OR
    [8] =>  have more strings 
    [9] => *AND
    [10] =>  finish with this string.
)

但是，如果您真的想坚持使用preg_match，则需要使用preg_match_all，它与preg_match（在问题中标记的内容）相似，除了它会进行全局/重复匹配。

示例：

$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
preg_match_all('~(?:(?:(?!\*(?:AND|OR)).)+)|(?:\*(?:AND|OR))~',$string,$matches);
print_r($matches);

输出：

Array
(
    [0] => Array
        (
            [0] => This is a t3xt with special characters like !#. 
            [1] => *AND
            [2] =>  and this is another text with special characters 
            [3] => *AND
            [4] =>  this repeats 
            [5] => *OR
            [6] =>  do not repeat 
            [7] => *OR
            [8] =>  have more strings 
            [9] => *AND
            [10] =>  finish with this string.
        )

)

首先，请注意，与preg_split不同，preg_match_all（和preg_match）返回一个多维度数组，而不是单维度数组。其次，从技术上讲，我使用的模式可以简化一些，但是这样做的代价是必须引用返回的多维数组中的多个数组（一个数组用于匹配的文本，另一个数组用于匹配的定界符），那么您将不得不遍历和备用参考； IOW，将进行额外的清理，以获得带有两个匹配集的最终单个数组，如上所述。

我之所以仅显示此方法，是因为您在问题中从技术上要求您这样做，但我建议使用preg_split，因为它可以节省很多此类开销，以及为什么要首先创建它（更好的方法）解决这种情况）。

正则表达式用于捕获重复单词之间的组

1 个答案: