Question

我刚刚学会了几个小时前的报废和cUrl，从那时起我正在玩那个。不过，我现在面对一些奇怪的事情。下面的代码适用于某些网站而不是其他网站（当然我修改了网址和xpath ...）。请注意，当我测试curl_exec是否正确执行时，我没有引发错误。所以这个问题必定来自于somwhere。我的一些问题如下：

如何检查新DOMDocument是否已正确创建：if（??）
如何检查新的DOMDocument是否已使用html正确填充？
...如果创建了新的DOMXPath对象？

希望我很清楚。提前感谢您的回复。干杯。马克

我的php：

<?php
$target_url = "http://www.somesite.com";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);

if (!$html) {
    echo "<br />cURL error number:" .curl_errno($ch);
    echo "<br />cURL error:" . curl_error($ch);
    exit;
}

// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->query('somepath');

for ($i = 0; $i < $hrefs->length; $i++) {
    $href = $hrefs->item($i);
    $url = $href->getAttribute('href');
    echo "<br />Link: $url";
}

?>

Answer 1

使用try / catch检查文档对象是否已创建，然后检查loadHTML（）的返回值以确定HTML是否已加载到文档中。您也可以在XPath对象上使用try / catch。

try
{
    $dom = new DOMDocument();

    $loaded = $dom->loadHTML($html);

    if($loaded)
    {
        // loaded OK
    }
    else
    {
        // could not load HTML
    }
}
catch(Exception $e)
{
    // document could not be created, see $e->getMessage()
}

Answer 2

问题解决了。这个错误来自于给出了错误道路的萤火虫。非常感谢MrCode的支持......

PHP使用curl进行刮擦 - 如何进行调试

2 个答案: