我正在尝试使用以下代码从其他网站中提取div:
<?php
$doc = new DomDocument;
//We need to validate our document before refering to the id
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('http://myaddresshere'));
var_dump($doc->getElementById('the div'));
?>
我可以得到结果,但我之前也得到一个很长的PHP代码字符串:
object(DOMElement)#2 (18) { ["tagName"]=> string(2) "h3" ["schemaTypeInfo"]=> NULL ["nodeName"]=> string(2) "h3" ["nodeValue"]=> string(35) "Telephone Technical Support: Active" ["nodeType"]=> int(1) ["parentNode"]=> string(22) "(object value omitted)" ["childNodes"]=> string(22) "(object value omitted)" ["firstChild"]=> string(22) "(object value omitted)" ["lastChild"]=> string(22) "(object value omitted)" ["previousSibling"]=> string(22) "(object value omitted)" ["nextSibling"]=> string(22) "(object value omitted)" ["attributes"]=> string(22) "(object value omitted)" ["ownerDocument"]=> string(22) "(object value omitted)" ["namespaceURI"]=> NULL ["prefix"]=> string(0) "" ["localName"]=> string(2) "h3" ["baseURI"]=> NULL ["textContent"]=> string(35) ***"Telephone Technical Support: Active"*** }
如何删除所有字符串并仅获取div的内容。
答案 0 :(得分:0)
您可以通过两种方式获取节点的值:
DomElement->nodeValue; // inherited from DomNode
或通过
<?php
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
?>
有关详细信息,请参阅http://php.net/manual/en/class.domelement.php和http://www.php.net/manual/en/class.domnode.php。