最近我试图用curl抓一个网站,我在浏览器中什么都没有。我发现它用
返回标题<meta name="robots" content="noindex,nofollow">
我以前用来刮的代码是
function readThisPage($url){
$curlHeaders = array (
'Accept-Language: en-us',
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15',
'Connection: Keep-Alive',
);
$context = stream_context_create (array ('http' => array ('timeout' => 60)));
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt ($ch, CURLOPT_HTTPHEADER, $curlHeaders);
$buffer = curl_exec($ch);
curl_close($ch);
return $buffer;
}
我也试过file_get_html()
两者都返回相同。我怎样才能抓住这类网站。