我非常渴望找到获取此文本字符串的解决方案
<h6>First pane</h6>
... pane content ...
<h6>Second pane</h6>
Hi, this is a comment.
To delete a comment, just log in and view the post's comments.
There you will have the option to edit
or delete them.
<h6>Last pane</h6>
... last pane content ...
解析成PHP数组。
我需要将其分开
1.
1.0=> First pane
1.1=> ... pane content ...
2.
2.0=> Second pane
2.1=> Hi, this is a comment.
To delete a comment, just log in and view the post's comments.
There you will have the option to edit
or delete them.
3.
3.0=> Last pane
3.1=> ... last pane content ...
答案 0 :(得分:1)
你的正则表达式应如下所示:
/<h6>([^<]+)<\/h6>([^<]+)/im
如果您运行以下脚本,您将看到您要查找的值位于$ matches [1]和$ matches [2]中。
$s = "<h6>First pane</h6>
... pane content ...
<h6>Second pane</h6>
Hi, this is a comment.
To delete a comment, just log in and view the post's comments.
There you will have the option to edit
or delete them.
<h6>Last pane</h6>
... last pane content ..";
$r = "/<h6>([^<]+)<\/h6>([^<]+)/im";
$matches = array();
preg_match_all($r,$s,$matches);
print_r($matches);
答案 1 :(得分:1)
您不应该尝试使用正则表达式解析HTML。这注定会给除了最简单的HTML之外的所有人带来很多痛苦和不快乐,并且如果你的doc结构中的任何内容发生变化,它将立即中断。请使用正确的HTML或DOM解析器,例如php的DOMDocument
http://php.net/manual/en/class.domdocument.php
例如,您可以使用getElementsByTagName http://www.php.net/manual/en/domdocument.getelementsbytagname.php获取所有h6
的
答案 2 :(得分:0)
我相信你正在寻找PREG_SET_ORDER标志。
$regex = '~<h6>([^<]+)</h6>\s*([^<]+)~i';
preg_match_all($regex, $source, $matches, PREG_SET_ORDER);
这样,$ matches数组中的每个元素都是一个数组,其中包含整个匹配,后跟单个匹配尝试的所有组捕获。第一场比赛的结果如下:
Array ( [0] => Array ( [0] => First pane ... pane content ... [1] => First pane [2] => ... pane content ... )
编辑:注意我添加的\s*
。没有它,匹配的内容总是在没有行分隔符的情况下开始。