我是Lynda.com的成员,我想从他们的网站上获取一个HTML页面并将其保存到我的磁盘上,问题是每当我尝试通过CURL获取页面时,我都会获得非成员页面(它要求我注册),我无法理解为什么我无法获得成员页面:(
我的代码:
get_remote_file_to_cache();
function get_remote_file_to_cache()
{
$the_site = "http://www.lynda.com/AIR-3-0-tutorials/Flex-4-6-and-Mobile-Apps-New-Features/90366-2.html";
$curl = curl_init();
$fp = fopen("cache/temp_file.html", "w");
curl_setopt($curl, CURLOPT_URL, $the_site);
curl_setopt($curl, CURLOPT_COOKIE, '/cookie.txt');
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$http_headers = array(
'Host: www.lynda.com',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2',
'Accept: */*',
'Accept-Language: en-us,en;q=0.5',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Connection: keep-alive'
);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, $http_headers);
curl_exec($curl);
$httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if($httpCode == 404)
{
touch('cache/404_err.txt');
}
else
{
$contents = curl_exec($curl);
fwrite($fp, $contents);
}
curl_close($curl);
}
我在Windows 7上并在WAMP上运行。
我不确定的一件事是,“cookie.txt”文件是否被读取(不确定路径是否正确所以我将cookie.txt文件也放在服务器的根目录中)在我运行此脚本的目录中。)
提前致谢!
-----------通过在线手册---------
找到一些代码// $url = page to POST data
// $ref_url = tell the server which page you came from (spoofing)
// $login = true will make a clean cookie-file.
// $proxy = proxy data
// $proxystatus = do you use a proxy ? true/false
function
curl_grab_page($url,$ref_url,$data,$login,$proxy,$proxystatus){
if($login == 'true') {
$fp = fopen("ryanCookie.txt", "w");
fclose($fp);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "ryanCookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "ryanCookie.txt");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if ($proxystatus == 'true') {
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
}
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $ref_url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
ob_start();
return curl_exec ($ch); // execute the curl command
ob_end_clean();
curl_close ($ch);
unset($ch);
}
echo curl_grab_page("https://www.lynda.com/login/login.aspx", "http://www.lynda.com/", "simple_username=*******&simple_password=*******", "true", "null", "false")."done!";
但它仍然不起作用:( 这是我获得上述代码的页面:http://php.net/manual/en/function.curl-setopt.php
答案 0 :(得分:1)
您需要了解互联网和http的工作原理。您会看到,当您访问网站时,他们通常会为您提供Cookie来跟踪您的状态。您还将以非登录成员身份启动。点击登录按钮后,服务器会将您的状态更新为登录状态,并使用cookie在服务器站点会话或浏览器中存储此状态。
回到你的问题,因为你想访问会员页面,这意味着你需要先做以下步骤,了解lynda.com的工作原理。但是,我在下面的步骤相当笼统:
有关详细信息,请查看以下资源:
答案 1 :(得分:0)
也许您需要发送Authorization标头,其中包含HTTP标头部分中网站的用户名和密码。
答案 2 :(得分:0)
要获取会员页面,您需要在网站上登录。为此,您需要:
或者,您可以尝试在登录后从浏览器中提取Cookie,并将其与curl_setopt($ch, CURLOPT_COOKIE, 'a=b;c=d');
一起使用,但这可能不起作用,因为网站也可以使用IP或会话检查。