以下是我的代码,我正在尝试抓取以下网址但由于某种原因,html源代码根本没有被删除。为什么在这个URL上没有发生刮擦?
我尝试使用File_get_contents
以及简单的HTML DOM库,但它没有刮掉。
URL: http://www.zazzle.com/protoceratops_t_shirt-235065458404753105
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo get_data('http://www.zazzle.com/protoceratops_t_shirt-235065458404753105');
答案 0 :(得分:0)
你可以试试这个:
function get_data($url) {
try {
$ch = curl_init();
$timeout = 5;
if (FALSE === $ch)
throw new Exception('failed to initialize');
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$content = curl_exec($ch);
if (FALSE === $content)
throw new Exception(curl_error($ch), curl_errno($ch));
// ...process $content now
return $content;
} catch(Exception $e) {
trigger_error(sprintf(
'Curl failed with error #%d: %s',
$e->getCode(), $e->getMessage()),
E_USER_ERROR);
}
}
echo get_data('http://www.zazzle.com/protoceratops_t_shirt-235065458404753105');
如果您碰巧有错误,也会返回错误。
所有功劳归于: curl_exec() always returns false