我必须解析以下HTML代码:
<ul>
<li><span><input id="testing_5" type="checkbox" name="filter" value="5"></span><label for="testing_5"><div>Label 1</div><span>579<span></label></li>
<li><span><input id="testing_4" type="checkbox" name="filter" value="4"></span><label for="testing_4"><div>Label 2</div><span>356<span></label></li>
<li><span><input id="testing_3" type="checkbox" name="filter" value="3"></span><label for="testing_3"><div>Label 3</div><span>109<span></label></li>
<li><span><input id="testing_2" type="checkbox" name="filter" value="2"></span><label for="testing_2"><div>Label 4</div><span>32<span></label></li>
<li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li>
</ul>
我想在任何标签内打印代码,所以我写了一个简单的PHP脚本,如下所示:
$scrape_obj = str_get_html('<ul><li><span><input id="testing_5" type="checkbox" name="filter" value="5"></span><label for="testing_5"><div>Label 1</div><span>579<span></label></li><li><span><input id="testing_4" type="checkbox" name="filter" value="4"></span><label for="testing_4"><div>Label 2</div><span>356<span></label></li><li><span><input id="testing_3" type="checkbox" name="filter" value="3"></span><label for="testing_3"><div>Label 3</div><span>109<span></label></li><li><span><input id="testing_2" type="checkbox" name="filter" value="2"></span><label for="testing_2"><div>Label 4</div><span>32<span></label></li><li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>');
$obj = $scrape_obj->find("label[for^='testing_']");
for($i=0; $i<count($obj); $i++) {
echo "\n Number $i\n $obj[$i]\n\n";
}
这是输出:
Number 0
<label for="testing_5"><div>Label 1</div><span>579<span></label></li><li><span><input id="testing_4" type="checkbox" name="filter" value="4"></span><label for="testing_4"><div>Label 2</div><span>356<span></label></li><li><span><input id="testing_3" type="checkbox" name="filter" value="3"></span><label for="testing_3"><div>Label 3</div><span>109<span></label></li><li><span><input id="testing_2" type="checkbox" name="filter" value="2"></span><label for="testing_2"><div>Label 4</div><span>32<span></label></li><li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>
Number 1
<label for="testing_4"><div>Label 2</div><span>356<span></label></li><li><span><input id="testing_3" type="checkbox" name="filter" value="3"></span><label for="testing_3"><div>Label 3</div><span>109<span></label></li><li><span><input id="testing_2" type="checkbox" name="filter" value="2"></span><label for="testing_2"><div>Label 4</div><span>32<span></label></li><li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>
Number 2
<label for="testing_3"><div>Label 3</div><span>109<span></label></li><li><span><input id="testing_2" type="checkbox" name="filter" value="2"></span><label for="testing_2"><div>Label 4</div><span>32<span></label></li><li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>
Number 3
<label for="testing_2"><div>Label 4</div><span>32<span></label></li><li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>
Number 4
<label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>
正确的输出必须是:
Number 0
<label for="testing_5"><div>Label 1</div><span>579<span></label>
Number 1
<label for="testing_4"><div>Label 2</div><span>356<span></label>
Number 2
<label for="testing_3"><div>Label 3</div><span>109<span></label>
Number 3
<label for="testing_2"><div>Label 4</div><span>32<span></label>
Number 4
<label for="testing_1"><div>Label 5</div><span>13<span></label>
我该如何解决?
解
问题是未封闭的span标签。你可以用一个简单的正则表达式解决它:
$pattern = "/<span>([0-9]+)<span>/";
$replacement = "<span>$1</span>";
$html_code = preg_replace($pattern, $replacement, $html_code);
$ html_code包含要解析的代码。
答案 0 :(得分:0)
您可以使用substr()
和strpos()
的组合来查找第一个标签块的结束位置。
将它放在你的回声之前的循环中:
$obj[$i] = substr($obj[$i],0,strpos($obj[$i],'</label>')+8);