Question

我有一个字符串

$string = 'this is test <b>bold</b> this is another test <img src="#"> image' ;

我想单独拆分html标签＆amp;正常文本。

需要以下输出：

[0] => this is test
[1] => <b>bold</b>
[2] => this is another test
[3] => <img src="#">
[4] => image

使用此代码。

$strip = preg_split('/\s+(?![^<>]+>)/m', $string , -1, PREG_SPLIT_DELIM_CAPTURE) ;

输出

[0] => this
[1] => is
[2] => test
[3] => <b>bold</b>
[4] => this
[5] => .....

我是新手。请帮助！

Answer 1

我发现使用preg_match：

更容易获得该结果

$string = 'this is test <b>bold</b> this is another test <img src="#"> image <hr/>';
preg_match_all('/<([^\s>]+)(.*?)>((.*?)<\/\1>)?|(?<=^|>)(.+?)(?=$|<)/i',$string,$result);
$result = $result[0];
// assign the result to the variable
foreach ($result as &$group) {
    $group = preg_replace('/^\s*(.*?)\s*$/','$1',$group);
    // this is to eliminate preceding and trailing spaces
}
print_r($result);

修改

我假设在标记的开始和结束之间应该至少有一个字符，但是没有必要，所以我将第二个+更改为*我考虑到了标签中不区分大小写的可能性。

<强>输出：

Array ( [0] => this is test [1] => bold [2] => this is another test [3] => <img src="#"> [4] => image [4] => <hr/> )

编辑2：

这不适用于评论中举例说明的不正常情况：

foobaritalic或foobarbazfail

为了使其工作，应调整RegEx以查看匹配内部并相应地处理它们。

使用preg_split提取HTML标记

1 个答案:

编辑2：