以下是一个示例字符串:
$string = '<strong>Lorem ipsum dolor</strong> sit <img src="test.png" /> amet <span class="test" style="color:red">consec<i>tet</i>uer</span>.';
我想将字符串拆分为数组,以便在命中空格或命中html标记时忽略字符串(忽略html标记内的空格)。例如:
Array
(
[0] => <strong>
[1] => Lorem
[2] => ipsum
[3] => dolor
[4] => </strong>
[5] => sit
[6] => <img src="test.png" />
[7] => amet
[8] => <span class="test" style="color:red">
[9] => consec
[10] => <i>
[11] => tet
[12] => </i>
[13] => uer
[14] => </span>
[15] => .
)
但我无法做到这一点。我使用preg_split来实现这个想法,但我认为我在正则表达式中错了。下面是我尝试的一些表达,但结果不是我想要的。
$chars = preg_split('/(<[^>]*[^\/]>)/i', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
/* Results */
Array
(
[0] => <strong>
[1] => Lorem ipsum dolor
[2] => </strong>
[3] => sit <img src="test.png" /> amet
[4] => <span class="test" style="color:red">
[5] => consec
[6] => <i>
[7] => tet
[8] => </i>
[9] => uer
[10] => </span>
[11] => .
)
其他正则表达式的结果是:
$chars = preg_split('/\s+(?![^<>]*>)/x', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
/* Results */
Array
(
[0] => <strong>Lorem
[1] => ipsum
[2] => dolor</strong>
[3] => sit
[4] => <img src="test.png" />
[5] => amet
[6] => <span class="test" style="color:red">consec<i>tet</i>uer</span>.
)
并且另一个表达式的结果是(非常接近):
$chars = preg_split('/\s*(<[^>]*>)/i', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
/* Results */
Array
(
[0] => <strong>
[1] => Lorem ipsum dolor
[2] => </strong>
[3] => sit
[4] => <img src="test.png" />
[5] => amet
[6] => <span class="test" style="color:red">
[7] => consec
[8] => <i>
[9] => tet
[10] => </i>
[11] => uer
[12] => </span>
[13] => .
)
答案 0 :(得分:0)
你几乎要靠近它了。但是您需要将<[^>]*>
更改为更具体的正则表达式<\/?\w+[^<>]*>
,然后您需要为空格|\s+
设置替换。您也不需要i
标记:
preg_split('/(<\/?\w+[^<>]*>)|\s+/', $string, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE)