我正在为一个项目编写一个web scraper,它需要登录并保存一些页面,但在登录后,保存cookie.txt
它会重定向回登录页面。看起来它没有登录。
这是我的代码:
<?php
$ch = curl_init();
$cookie_file_path = 'cookie.txt';
$cookie_file_path = realpath($cookie_file_path);
$data = array();
$data['txtUser'] = "username";
$data['txtPass'] = "password";
$postStr = "";
foreach($data as $key=>$d){
$postStr .= $key.'='.urlencode($d).'&';
}
$postStr = substr($postStr,0,-1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$agent = $_SERVER["HTTP_USER_AGENT"];
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
//new ones
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_URL,"http://madstore.su/login.php");
curl_setopt($ch,CURLOPT_POST,TRUE);
curl_setopt($ch,CURLOPT_POSTFIELDS,$postStr);
curl_exec ($ch); // execute the curl command
echo 'Curl error: ' . curl_error($ch); //no errrors
curl_close ($ch);
unset($ch);
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_URL,"http://madstore.su/index.php");
//new ones
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_exec ($ch);
echo 'Curl error: ' . curl_error($ch);
curl_close ($ch);
?>
我已经阅读了有关StackOverflow的所有问题,并在几小时后在Google上进行了搜索。
以下是cookie.txt
中的内容:
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.
#HttpOnly_.madstore.su TRUE / FALSE 1577145000 __cfduid d3f365e8218ab84f921e43db0d1500e7c1391327438626
madstore.su FALSE / FALSE 0 PHPSESSID t41g1j9cdl800e9qdj2pq96ef1
以下是卷曲错误:
HTTP/1.1 302 Found
Server: cloudflare-nginx
Date: Sun, 02 Feb 2014 08:01:40 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.1.6
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
location: login.php
CF-RAY: f655ad243d007e5-LAX
HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Sun, 02 Feb 2014 08:01:41 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.1.6
CF-RAY: f655ad5141f07e5-LAX
如果有人能帮我解决这个问题,我将非常感谢。
答案 0 :(得分:0)
你的做法是错误的。首先使用curl浏览登录页面(http://madstore.su/login.php
),以便它可以将cookie存储到文件中。
然后进行卷曲发布并使用之前保存的cookie。此外,您缺少此POST
参数,因此请将其与您的数据相加。
$data['btnLogin'] = "Log in";
登录完成后,使用最终的卷曲GET
浏览所需的页面。