我希望使用 file_get_html()功能获取并处理网页。所以我尝试用curl函数最佳地完成它,如下所示:
function file_get_html_new($url, $use_include_path = false, $context=null)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, $context );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$contents = curl_exec($ch);
curl_close($ch);
if (empty($contents) || strlen($contents) > MAX_FILE_SIZE)
{
return false;
}
$dom->load($contents, $lowercase, $stripRN);
return $dom;
}
$html = file_get_html_new( 'http://***.us/'. $imdb_id , false , array('Host: ***.us',
'Connection: keep-alive',
'Cache-Control: max-age=0',
'Upgrade-Insecure-Requests: 1',
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding: gzip, deflate, sdch',
'Accept-Language: en-US,en;q=0.8,fa;q=0.6',
'Cookie: ***',
'AlexaToolbar-ALX_NS_PH: AlexaToolbar/alx-4.0'));
但是我遇到了运行代码的以下错误:
PHP Fatal error: Call to a member function load() on a non-object
通过这样获得curl函数的结果:
function file_get_html_new($url, $use_include_path = false, $context=null)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, $context );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$contents = curl_exec($ch);
curl_close($ch);
if (empty($contents) || strlen($contents) > MAX_FILE_SIZE)
{
return false;
}
//$dom->load($contents, $lowercase, $stripRN);
echo $contents;
}
显示了一些含糊不清的结果,我发现它发生的原因是内容是“gzip”ed。我像这样解压缩它们:
$contents = gzinflate( substr(curl_exec($ch),10,-8) );
并试一试:
$contents = gzdecode (curl_exec($ch));
现在我有正确的内容,但错误仍然存在!你能帮我理解为什么吗?