关键字是“ * OR”或“ * AND”。
假设我有以下字符串:
这是带有!#等特殊字符的t3xt。 * AND这是 另一个带有特殊字符的文本*并且此重复*或不 重复*或具有更多字符串*并以此字符串结尾。
我想要以下
group1 "This is a t3xt with special characters like !#."
group2 "*AND"
group3 "and this is another text with special characters"
group4 "*AND"
group5 "this repeats"
group6 "*OR"
group7 "do not repeat"
group8 "*OR"
group9 "have more strings"
group10 "*AND"
group11 "finish with this string."
我曾经这样尝试过:
(.+?)(\*AND\*OR)
但是它只获取第一个字符串,然后我需要继续重复代码以收集其他字符串,但是问题是有些字符串只有一个* AND,或者只有一个* OR或数十个字符串,即相当随机。而且下面的正则表达式也不起作用:
((.+?)(\*AND\*OR))+
例如:
这是带有!#等特殊字符的t3xt。 * AND这是 另一个带有特殊字符的文字
答案 0 :(得分:2)
PHP对于此类事情有一个preg_split
函数。 preg_split
允许您使用分隔符来分割字符串,分隔符可以定义为正则表达式模式。此外,它还有一个参数,允许您在匹配/拆分结果中包括匹配的定界符。
因此,正则表达式用于分隔符本身,而不是编写用于匹配全文的正则表达式。
示例:
$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
$string = preg_split('~(\*(?:AND|OR))~',$string,0,PREG_SPLIT_DELIM_CAPTURE);
print_r($string);
输出:
Array
(
[0] => This is a t3xt with special characters like !#.
[1] => *AND
[2] => and this is another text with special characters
[3] => *AND
[4] => this repeats
[5] => *OR
[6] => do not repeat
[7] => *OR
[8] => have more strings
[9] => *AND
[10] => finish with this string.
)
但是,如果您真的想坚持使用preg_match
,则需要使用preg_match_all
,它与preg_match
(在问题中标记的内容)相似,除了它会进行全局/重复匹配。
示例:
$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
preg_match_all('~(?:(?:(?!\*(?:AND|OR)).)+)|(?:\*(?:AND|OR))~',$string,$matches);
print_r($matches);
输出:
Array
(
[0] => Array
(
[0] => This is a t3xt with special characters like !#.
[1] => *AND
[2] => and this is another text with special characters
[3] => *AND
[4] => this repeats
[5] => *OR
[6] => do not repeat
[7] => *OR
[8] => have more strings
[9] => *AND
[10] => finish with this string.
)
)
首先,请注意,与preg_split
不同,preg_match_all
(和preg_match
)返回一个多维度数组,而不是单维度数组。其次,从技术上讲,我使用的模式可以简化一些,但是这样做的代价是必须引用返回的多维数组中的多个数组(一个数组用于匹配的文本,另一个数组用于匹配的定界符) ,那么您将不得不遍历和备用参考; IOW,将进行额外的清理,以获得带有两个匹配集的最终单个数组,如上所述。
我之所以仅显示此方法,是因为您在问题中从技术上要求您这样做,但我建议使用preg_split
,因为它可以节省很多此类开销,以及为什么要首先创建它(更好的方法)解决这种情况)。