我想使用PHP的cURL访问外部网站上的页面,并获取该页面的整个html内容。
当我访问该网站时,它会将我重定向到同一网站上的另一个页面。此外,我必须设置useragent,我想要一个useragent PC windows7铬和iPhone 4s。这是我到目前为止所得到的:
$ch = curl_init ($url);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_AUTOREFERER , true)
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
$kl = curl_exec ($ch);
curl_close($ch);
echo $kl;
注意:
我可能会遇到更多错误。
答案 0 :(得分:6)
您可能还需要考虑使用https
的网址$cookie = tmpfile();
$userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31' ;
$ch = curl_init($url);
$options = array(
CURLOPT_CONNECTTIMEOUT => 20 ,
CURLOPT_USERAGENT => $userAgent,
CURLOPT_AUTOREFERER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_COOKIEFILE => $cookie,
CURLOPT_COOKIEJAR => $cookie ,
CURLOPT_SSL_VERIFYPEER => 0 ,
CURLOPT_SSL_VERIFYHOST => 0
);
curl_setopt_array($ch, $options);
$kl = curl_exec($ch);
curl_close($ch);
echo $kl;
答案 1 :(得分:4)
所以:
CURLOPT_FOLLOWLOCATION
为@TroyCheng指示CURLOPT_COOKIEFILE
& CURLOPT_COOKIEJAR
。答案 2 :(得分:1)
为什么不使用像Buzz这样的库?
$request = new Buzz\Message\Request('GET', '/', 'http://google.com');
$response = new Buzz\Message\Response();
$client = new Buzz\Client\Curl();
// do not check https validity
$client->setVerifyPeer(false);
// define your user agent
$client->setOption('CURLOPT_USERAGENT', $userAgent);
$client->setOption('CURLOPT_COOKIEFILE', true);
$client->setOption('CURLOPT_COOKIEJAR', true);
$client->send($request, $response);
if ($response->isOk())
{
echo $response->getContent();
// or if you want the dom
echo $response->toDomDocument();
}