PHP Xpath返回正文的节点,而不是图像

时间:2016-04-28 21:31:43

标签: php xpath

我正在尝试从此网址http://www.homegate.ch/kaufen/105652197?3中检索所有图片。我在PHP中使用Xpath。出于某种原因,我可以使用Xpath而不是图像来检索身体。这是我的剧本:

<?php

$url = "http://www.homegate.ch/kaufen/105652197?3";

$body = '//body';
$img = '//img';

$html = file_get_contents($url);

# Call htmlentities as the $url content is not well-formatted: http://stackoverflow.com/questions/1685277/warning-domdocumentloadhtml-htmlparseentityref-expecting-in-entity
$html = htmlentities($html);

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DomXPath($dom);

$query = $xpath->query($body);

if($query->length == 1)
    echo $query->item(0)->nodeValue;

if($query->length < 1)
    echo "Xpath for body is no good!";

$query = $xpath->query($img);

if($query->length == 1)
    echo $query->item(0)->nodeValue;

if($query->length < 1)
    echo "Xpath for image is no good!";

运行此脚本将返回:

1. <!DOCTYPE html>..
2. Xpath for image is no good!

这里出了什么问题? - 为什么Xpath仅适用于body而不适用于img

1 个答案:

答案 0 :(得分:0)

您必须删除此行:

$html = htmlentities( $html );

要避免使用DOM警告,请改用此语法:

$dom = new DOMDocument();
libxml_use_internal_errors( True );         # <-------
$dom->loadHTML( $html );

使用您的语法,//body XPath查询显然是可以的,但有了这个内容:

<!DOCTYPE html>
<html prefix="og: http://ogp.me/ns#" class="no-js unknown unknown" lang="de">
<head><script type="text/javascript" src="/ver-20160426133955/assets/js/jquery.js"></script>
(...)
显然,这不是身体!