我最近刚刚删除了我几个月前写的一个脚本,目的是要从Amazon获取一些基本数据(我现在可以访问API,所以不需要它了),但是这困扰着我为什么看不到脚本中的错误。
<?php
# Example URLs:
# https://www.amazon.co.uk/s?k=22+Bdc+Scope&ref=nb_sb_noss
# https://www.amazon.co.uk/s?k=Angled+Ceiling+Speaker&ref=nb_sb_noss
$url = "https://www.amazon.co.uk/s?k=Angled+Ceiling+Speaker&ref=nb_sb_noss";
$html = file_get_contents($url);
echo parseHtmlAmazonScraper($html);
function parseHtmlAmazonScraper($html) {
try {
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//div[@class='sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32']");
if (sizeof($nodeList) == 0) {
$nodeList = $xpath->query("//div[@class='sg-col-4-of-12 sg-col-8-of-16 sg-col-16-of-24 sg-col-12-of-20 sg-col-24-of-32 sg-col sg-col-28-of-36 sg-col-20-of-28']");
}
$res = [];
foreach ($nodeList as $node) {
$new = new DomDocument;
$new->appendChild($new->importNode($node, true));
$N = new DomXPath($new);
$nodeImg = $N->query("//img[@class='s-image']")->item(0);
$Img = $nodeImg->getAttribute('src');
$nodeLink = $N->query("//a[@class='a-link-normal a-text-normal']")->item(0);
$Path = $nodeLink->getAttribute('href');
$Name = trim($nodeLink->textContent);
$res[] = [
'productLink' => $Path,
'productDescription' => $Name,
'productImage' => $Img
];
}
return $res;
} catch(Exception $e) {
echo $e->getMessage();
}
}
?>
几个月前进行测试时,它运行良好,当我检查HTML结构时,我看不到任何明显的变化,我得到的是:
严重错误:未捕获的错误:调用成员函数getAttribute()为空
因此,当我执行$nodeList
时,我基本上会从var_dump()
返回空:
object(DOMNodeList)[47]
public 'length' => int 44
测试页的HTML结构对我来说似乎不错,在这里我明显缺少什么吗?
任何朝着正确方向的帮助将不胜感激,我通常想了解事情为什么会破裂。