我正在尝试获取此页面的内容:http://www.nytimes.com/2014/01/26/us/politics/rand-pauls-mixed-inheritance.html?hp&_r=0
我尝试了file_get_contents
和curl
解决方案,但都给了我一个NYTimes的登录页面,我不知道为什么。
尝试了这些file_get_contents()/curl getting unexpected page,PHP file_get_contents() behaves differently to browser,file_get_content get the wrong web
有什么解决方案吗?感谢
编辑:
//this is the curl code I use
$cookieJar = dirname(__FILE__) . '/cookie.txt';
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieJar);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieJar);
curl_setopt($ch, CURLOPT_URL, $link);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
答案 0 :(得分:2)
尝试使用将cookie保存到脚本首先驻留的同一目录来测试它
所以设置像那样的cookie路径
$ cookie =“cookie.txt”;
这段代码与我合作,我得到了页面
<?php
function curl_get_contents($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$get_page = curl_get_contents("http://www.nytimes.com/2014/01/26/us/politics/rand-pauls-mixed-inheritance.html?hp&_r=1");
echo $get_page;
?>
答案 1 :(得分:1)
我认为您需要cURL才能保存Cookie。尝试将这些行添加到cURL设置中。对我来说这很有效:
$cookie = dirname(__FILE__) . "\cookie.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
答案 2 :(得分:0)
使用Live HTTP Headers firefox插件检查页面访问期间发生了什么。可以有重定向,cookie设置等。然后尝试用php curl实现这种行为(注意:设置user-agent as和其他客户端头与浏览器相同)