我正在尝试使用curl登录imdb.com,然后提交电影评级。我知道这违背了他们的ToS,但我没有构建应用程序或任何东西,只是一个小脚本供个人使用。我是卷毛的新手,但我通过使用stackoverflow上的信息获得了登录部分。登录后,我将curl URL设置为http://www.imdb.com/ratings/_ajax/title,因为这是提交评级的地方。但是,当我执行curl命令时,没有提交评级。不知道如何解决这个问题,所以希望有人可以指出我正确的方向?这是我到目前为止所得到的:
// options
$username = 'username';
$password = 'password';
$url_login = "https://secure.imdb.com/register-imdb/login";
$url_rating = "http://www.imdb.com/ratings/_ajax/title";
$headers[] = "Accept: */*";
$headers[] = "Connection: Keep-Alive";
$headers[] = "Content-Type: application/x-www-form-urlencoded";
$cookie_file_path = dirname(__FILE__)."/cookies.txt";
$agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";
// get login page
$ch = curl_init();
// basic curl options for all requests
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
// log
$verbose = fopen("loginfetch.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);
// set first URL
curl_setopt($ch, CURLOPT_URL, $url_login);
// execute session to get cookies and required form inputs
$return = curl_exec($ch);
// close connection
curl_close($ch);
//echo $return;
// grab the hidden inputs from the form required to login
$fields = getFormFields($return);
$fields['login'] = $username;
$fields['password'] = $password;
// set postfields using what we extracted from the form
$postfields = http_build_query($fields);
// post to login page
$ch = curl_init();
// set post options
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
curl_setopt($ch, CURLOPT_URL, $url_login);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
// log
$verbose = fopen("loginpost.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);
// perform login
$return = curl_exec($ch);
// close connection
curl_close($ch);
//echo $return;
//submit rating
$data['tconst'] = 'tt1709143';
$data['rating'] = '5';
$data['tracking_tag'] = 'title-maindetails';
$post = http_build_query($data);
// post to submit page
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_URL, $url_rating);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
// log
$verbose = fopen("ratingsubmit.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);
$return = curl_exec($ch);
// close connection
curl_close($ch);
//echo $return;
启用日志记录后,我得到三个日志。第三个将请求记录到IMDB评级页面,该页面不起作用,因为您可以在下面的日志中使用:
* About to connect() to www.imdb.com port 80 (#0)
* Trying 72.21.203.211... * connected
* Connected to www.imdb.com (72.21.203.211) port 80 (#0)
> POST /ratings/_ajax/title HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36
Host: www.imdb.com
Cookie: cs=ECxJtiNrhA/m+SuIKh15AweBbbqgkVqM8BkNuqOS/TKzsu7Z84JeKeCRXRoA0U26oKcq7CWRbbqj9TkMh9HN2eCRWyxAGW26oKdbraCRbbqgsW26oJFt+uDBHYqg==; cache=BCYqeti-w3RKC8bV21R-BwArPk4ILOkGu0T6E1oB5KGihmddDp_kluyca1x7QLflsfnEZ9smi6EZc2uHo7eY5FZeXfG4EQ97tKKFR8VhyAW4d4Q; id=BCYiSjnWuGQc8HDlo5OAY8cDzxQyS5nHJqLgwq_9yI08DAjTU5l0CeOXL8dUvE28QUv1MNlBQ0MD5jEzs8OuhUVQKukg_AtlD58ORFostzT-mCzLCuv8a_mOFztCRGX7V3rpONDCl_xyKHAEj2JLSnWHI8VbKrpes93j5xsgNtdgeU0oYH3s93XMeRVWOM06V1Lg; session-id-time=1551966508; session-id=357-4286508-9576651; uu=BCYvAfd_f2bQLnYdtpdRlYkDth4AKSl6zlKVXzyzSLlagoM-bH3kvZe3FLFOj_KmoWbEkh-dRXiPZZtStWC72Dbsd6jCQiNnXDAyxc-_vmzg5yiJLuwbKVF6nICv9xuwCV_Gn-_Ek8gqTujYDQPdgIWR2Y3aXArES1RzXoqX1pA9jkZ1EkWFkVKNaukvSqxPQRJhE50xfMNMwaUJLJ8SLA1WRsIVLqp873yNvZf7ecyLd4hgmC7AxdbfzPtDCdwgaelx
Accept: */*
Connection: Keep-Alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 56
< HTTP/1.1 400 Bad Request
< Date: Sat, 08 Mar 2014 13:48:30 GMT
< Server: Server
< X-Frame-Options: SAMEORIGIN
< Content-Type: text/html;charset=UTF-8
< Content-Language: en-US
< Vary: Accept-Encoding,User-Agent
* Replaced cookie cache="BCYs4XPUvL_p2AL_pctQP7qEdwB9nBAXcIkiNxRZlqHtp9VjHkCy-GzvEIqsHCBHjjuGdWIyzZb1%0D%0Aip5WAl_SmYCtFg%0D%0A" for domain imdb.com, path /, expire 3541770158
< Set-Cookie: cache=BCYs4XPUvL_p2AL_pctQP7qEdwB9nBAXcIkiNxRZlqHtp9VjHkCy-GzvEIqsHCBHjjuGdWIyzZb1%0D%0Aip5WAl_SmYCtFg%0D%0A; Domain=.imdb.com; Expires=Thu, 26-Mar-2082 17:02:38 GMT; Path=/
< P3P: policyref="http://i.imdb.com/images/p3p.xml",CP="CAO DSP LAW CUR ADM IVAo IVDo CONo OTPo OUR DELi PUBi OTRi BUS PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA HEA PRE LOC GOV OTC "
< Cneonction: close
< Transfer-Encoding: chunked
<
* Connection #0 to host www.imdb.com left intact
* Closing connection #0
答案 0 :(得分:2)
我弄清楚出了什么问题。有一个缺少的字符串,IMDb希望与电影ID和评级一起提交。它被称为“Auth”,它是一个出现在电影电影页面上的字符串。所以我添加了一个查找auth字符串的函数,并在向IMDb提交评级时传递它。没有更多的错误。
如果有人感兴趣,这是整个(工作)的事情:
// options
$username = 'username';
$password = 'password';
$url_login = "https://secure.imdb.com/register-imdb/login";
$url_rating = "http://www.imdb.com/ratings/_ajax/title";
$movie_id = "tt1800241";
$url_movie = "http://www.imdb.com/title/" . $movie_id;
$data['tconst'] = $movie_id;
$data['rating'] = '7';
$data['tracking_tag'] = 'title-maindetails';
$headers[] = "Accept: */*";
$headers[] = "Connection: Keep-Alive";
$headers[] = "Content-Type: application/x-www-form-urlencoded";
$cookie_file_path = dirname(__FILE__)."/cookies.txt";
$agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";
/**
Step 1: get login page and cookies
**/
$ch = curl_init();
// basic curl options for all requests
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
// log
curl_setopt($ch, CURLOPT_VERBOSE, 1);
$verbose = fopen("loginfetch.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);
// set URL
curl_setopt($ch, CURLOPT_URL, $url_login);
// execute session to get cookies and required form inputs
$return = curl_exec($ch);
// close connection
curl_close($ch);
//echo $return;
/**
Step 2: post login credentials
**/
// grab the hidden inputs from the form required to login
$fields = getFormFields($return);
$fields['login'] = $username;
$fields['password'] = $password;
// set postfields using what we extracted from the form
$postfields = http_build_query($fields);
// post to login page
$ch = curl_init();
// set post options
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
// log
curl_setopt($ch, CURLOPT_VERBOSE, 1);
$verbose = fopen("loginpost.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);
// set URL
curl_setopt($ch, CURLOPT_URL, $url_login);
// perform login
$return = curl_exec($ch);
// close connection
curl_close($ch);
//echo $return;
/**
Step 3: get Auth string from movie page
**/
$ch = curl_init();
// basic curl options for all requests
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
// log
curl_setopt($ch, CURLOPT_VERBOSE, 1);
$verbose = fopen("authfetch.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);
// set URL
curl_setopt($ch, CURLOPT_URL, $url_movie);
// execute session
$return_auth = curl_exec($ch);
// close connection
curl_close($ch);
//echo $return_auth;
/**
Step 4: submit rating
**/
$data['auth'] = getAuth($return_auth);
$post = http_build_query($data);
// post to submit page
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
// log
curl_setopt($ch, CURLOPT_VERBOSE, 1);
$verbose = fopen("ratingsubmit.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);
// set URL
curl_setopt($ch, CURLOPT_URL, $url_rating);
// execute session
$return = curl_exec($ch);
// close connection
curl_close($ch);
//echo $return;
function getFormFields($data)
{
if (preg_match('/(<form method="post.*?<\/form>)/is', $data, $matches)) {
$inputs = getInputs($matches[1]);
return $inputs;
} else {
return('Login form not found.');
}
}
function getInputs($form)
{
$inputs = array();
$elements = preg_match_all('/(<input[^>]+>)/is', $form, $matches);
if ($elements > 0) {
for($i = 0; $i < $elements; $i++) {
$el = preg_replace('/\s{2,}/', ' ', $matches[1][$i]);
if (preg_match('/name=(?:["\'])?([^"\'\s]*)/i', $el, $name)) {
$name = $name[1];
$value = '';
if (preg_match('/value=(?:["\'])?([^"\'\s]*)/i', $el, $value)) {
$value = $value[1];
}
$inputs[$name] = $value;
}
}
}
return $inputs;
}
// when submitting a rating to IMDb you also need to send an 'auth' string which we grab from the rating-list div on the movie details page
function getAuth($data)
{
if (preg_match('/data-auth="(.*?)"/is', $data, $matches)) {
$auth = $matches[1];
return $auth;
} else {
return('Auth string not found.');
}
}
答案 1 :(得分:0)
提交请求时,您的Cookie为空。每次执行curl_exec
时都需要关闭卷曲手柄:
curl_close($ch);
这会将cookie存储到文件中。
因此,对于您的代码,您需要关闭它三次。确保在关闭后再次初始化卷曲,并确保每次都相应地指向cookie文件。