我在代码中工作,获取所有标签值"仅文字"来自html文件。但是如果任何标签有嵌套标签,它将进入Childs并获得不具有孩子的标签值。 我试过这个,但它有点遗失
php代码:
$dochtml = new DOMDocument();
$dochtml->loadHTMLFile("index2.html");
$nodes = $dochtml ->getElementsByTagName("a");
gettagsvalue($nodes);
function gettagsvalue($nodes){
if($nodes->length != 0){
for ($i=0;$i<$nodes->length;$i++){
foreach ($tags=["h1","h2","h3","h4","h5","h6","h7","a","img","li","span","p","pre","i","strong","div","ul"] as $tag){
if($nodes->item($i)->getElementsByTagName($tag)->length != 0){
if ($nodes->item($i)->getElementsByTagName($tag)->length == 1){
echo "here"."<br><br><br> $tag";
echo "<pre>" ;print_r($nodes->item($i)->getElementsByTagName($tag)->item(0));echo "</pre>" ;
}else{
echo "there"."<br><br><br> $tag";
gettagsvalue($nodes->item($i)->getElementsByTagName($tag));
// echo "$tag <br><br><br>";
}
// print_r($nodes->item($i)->getElementsByTagName($tag));echo "<br>";
}
}
}
}
}
我希望得到
&#34;绿色&#34; &#34;谷&#34;
HTML:
<a href="index.html" id="aaaaaaaaaaaa2015284957">
<img src="images/logo.png" width="50px" height="50px" id="imgaaaaaaaaaaimg732756221">
<span>Green</span>
<span id="spanaaaaaaaaaaspan1106733773">Valley</span>
</a>
答案 0 :(得分:0)
你考虑过使用textContent属性吗?这应该连接所有嵌套节点的文本节点。 有关详细信息,请参阅php domdocument read element inner text和PHP DOM textContent vs nodeValue?。