我正在尝试从此网址http://www.homegate.ch/kaufen/105652197?3
中检索所有图片。我在PHP中使用Xpath。出于某种原因,我可以使用Xpath而不是图像来检索身体。这是我的剧本:
<?php
$url = "http://www.homegate.ch/kaufen/105652197?3";
$body = '//body';
$img = '//img';
$html = file_get_contents($url);
# Call htmlentities as the $url content is not well-formatted: http://stackoverflow.com/questions/1685277/warning-domdocumentloadhtml-htmlparseentityref-expecting-in-entity
$html = htmlentities($html);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DomXPath($dom);
$query = $xpath->query($body);
if($query->length == 1)
echo $query->item(0)->nodeValue;
if($query->length < 1)
echo "Xpath for body is no good!";
$query = $xpath->query($img);
if($query->length == 1)
echo $query->item(0)->nodeValue;
if($query->length < 1)
echo "Xpath for image is no good!";
运行此脚本将返回:
1. <!DOCTYPE html>..
2. Xpath for image is no good!
这里出了什么问题? - 为什么Xpath仅适用于body
而不适用于img
答案 0 :(得分:0)
您必须删除此行:
$html = htmlentities( $html );
要避免使用DOM警告,请改用此语法:
$dom = new DOMDocument();
libxml_use_internal_errors( True ); # <-------
$dom->loadHTML( $html );
使用您的语法,//body
XPath查询显然是可以的,但有了这个内容:
<!DOCTYPE html>
<html prefix="og: http://ogp.me/ns#" class="no-js unknown unknown" lang="de">
<head><script type="text/javascript" src="/ver-20160426133955/assets/js/jquery.js"></script>
(...)
显然,这不是身体!