用curl将电影评级提交到imdb.com

时间:2014-03-04 12:20:22

标签: php curl login

我正在尝试使用curl登录imdb.com,然后提交电影评级。我知道这违背了他们的ToS,但我没有构建应用程序或任何东西,只是一个小脚本供个人使用。我是卷毛的新手,但我通过使用stackoverflow上的信息获得了登录部分。登录后,我将curl URL设置为http://www.imdb.com/ratings/_ajax/title,因为这是提交评级的地方。但是,当我执行curl命令时,没有提交评级。不知道如何解决这个问题,所以希望有人可以指出我正确的方向?这是我到目前为止所得到的:

// options
$username           = 'username';
$password           = 'password';
$url_login          = "https://secure.imdb.com/register-imdb/login"; 
$url_rating         = "http://www.imdb.com/ratings/_ajax/title";
$headers[]          = "Accept: */*";
$headers[]          = "Connection: Keep-Alive";
$headers[]          = "Content-Type: application/x-www-form-urlencoded";
$cookie_file_path   = dirname(__FILE__)."/cookies.txt";
$agent              = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";


// get login page
$ch = curl_init(); 

// basic curl options for all requests
curl_setopt($ch, CURLOPT_HTTPHEADER,  $headers);
curl_setopt($ch, CURLOPT_HEADER,  0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);         
curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); 
curl_setopt($ch, CURLOPT_VERBOSE, 1);
// log
$verbose = fopen("loginfetch.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);

// set first URL
curl_setopt($ch, CURLOPT_URL, $url_login);

// execute session to get cookies and required form inputs
$return = curl_exec($ch); 

// close connection
curl_close($ch);

//echo $return;


// grab the hidden inputs from the form required to login
$fields = getFormFields($return);
$fields['login'] = $username;
$fields['password'] = $password;

// set postfields using what we extracted from the form
$postfields = http_build_query($fields); 

// post to login page
$ch = curl_init(); 

// set post options
curl_setopt($ch, CURLOPT_HTTPHEADER,  $headers);
curl_setopt($ch, CURLOPT_HEADER,  0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);         
curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); 
curl_setopt($ch, CURLOPT_POST, 1); 
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields); 
curl_setopt($ch, CURLOPT_URL, $url_login);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
// log
$verbose = fopen("loginpost.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);

// perform login
$return = curl_exec($ch);  

// close connection
curl_close($ch);

//echo $return; 


//submit rating

$data['tconst'] = 'tt1709143';
$data['rating'] = '5';
$data['tracking_tag'] = 'title-maindetails';

$post = http_build_query($data); 

// post to submit page
$ch = curl_init(); 

curl_setopt($ch, CURLOPT_HEADER,  0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);         
curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); 
curl_setopt($ch, CURLOPT_POST, 1);                                                                                            
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);                                                                  
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);  
curl_setopt($ch, CURLOPT_URL, $url_rating);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
// log
$verbose = fopen("ratingsubmit.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);

$return = curl_exec($ch);

// close connection
curl_close($ch);

//echo $return;

启用日志记录后,我得到三个日志。第三个将请求记录到IMDB评级页面,该页面不起作用,因为您可以在下面的日志中使用:

* About to connect() to www.imdb.com port 80 (#0)
*   Trying 72.21.203.211... * connected
* Connected to www.imdb.com (72.21.203.211) port 80 (#0)
> POST /ratings/_ajax/title HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36
Host: www.imdb.com
Cookie: cs=ECxJtiNrhA/m+SuIKh15AweBbbqgkVqM8BkNuqOS/TKzsu7Z84JeKeCRXRoA0U26oKcq7CWRbbqj9TkMh9HN2eCRWyxAGW26oKdbraCRbbqgsW26oJFt+uDBHYqg==; cache=BCYqeti-w3RKC8bV21R-BwArPk4ILOkGu0T6E1oB5KGihmddDp_kluyca1x7QLflsfnEZ9smi6EZc2uHo7eY5FZeXfG4EQ97tKKFR8VhyAW4d4Q; id=BCYiSjnWuGQc8HDlo5OAY8cDzxQyS5nHJqLgwq_9yI08DAjTU5l0CeOXL8dUvE28QUv1MNlBQ0MD5jEzs8OuhUVQKukg_AtlD58ORFostzT-mCzLCuv8a_mOFztCRGX7V3rpONDCl_xyKHAEj2JLSnWHI8VbKrpes93j5xsgNtdgeU0oYH3s93XMeRVWOM06V1Lg; session-id-time=1551966508; session-id=357-4286508-9576651; uu=BCYvAfd_f2bQLnYdtpdRlYkDth4AKSl6zlKVXzyzSLlagoM-bH3kvZe3FLFOj_KmoWbEkh-dRXiPZZtStWC72Dbsd6jCQiNnXDAyxc-_vmzg5yiJLuwbKVF6nICv9xuwCV_Gn-_Ek8gqTujYDQPdgIWR2Y3aXArES1RzXoqX1pA9jkZ1EkWFkVKNaukvSqxPQRJhE50xfMNMwaUJLJ8SLA1WRsIVLqp873yNvZf7ecyLd4hgmC7AxdbfzPtDCdwgaelx
Accept: */*
Connection: Keep-Alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 56

< HTTP/1.1 400 Bad Request
< Date: Sat, 08 Mar 2014 13:48:30 GMT
< Server: Server
< X-Frame-Options: SAMEORIGIN
< Content-Type: text/html;charset=UTF-8
< Content-Language: en-US
< Vary: Accept-Encoding,User-Agent
* Replaced cookie cache="BCYs4XPUvL_p2AL_pctQP7qEdwB9nBAXcIkiNxRZlqHtp9VjHkCy-GzvEIqsHCBHjjuGdWIyzZb1%0D%0Aip5WAl_SmYCtFg%0D%0A" for domain imdb.com, path /, expire 3541770158
< Set-Cookie: cache=BCYs4XPUvL_p2AL_pctQP7qEdwB9nBAXcIkiNxRZlqHtp9VjHkCy-GzvEIqsHCBHjjuGdWIyzZb1%0D%0Aip5WAl_SmYCtFg%0D%0A; Domain=.imdb.com; Expires=Thu, 26-Mar-2082 17:02:38 GMT; Path=/
< P3P: policyref="http://i.imdb.com/images/p3p.xml",CP="CAO DSP LAW CUR ADM IVAo IVDo CONo OTPo OUR DELi PUBi OTRi BUS PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA HEA PRE LOC GOV OTC "
< Cneonction: close
< Transfer-Encoding: chunked
< 
* Connection #0 to host www.imdb.com left intact
* Closing connection #0

2 个答案:

答案 0 :(得分:2)

我弄清楚出了什么问题。有一个缺少的字符串,IMDb希望与电影ID和评级一起提交。它被称为“Auth”,它是一个出现在电影电影页面上的字符串。所以我添加了一个查找auth字符串的函数,并在向IMDb提交评级时传递它。没有更多的错误。

如果有人感兴趣,这是整个(工作)的事情:

// options
$username           = 'username';
$password           = 'password';
$url_login          = "https://secure.imdb.com/register-imdb/login"; 
$url_rating         = "http://www.imdb.com/ratings/_ajax/title";
$movie_id           = "tt1800241";
$url_movie          = "http://www.imdb.com/title/" . $movie_id;
$data['tconst']     = $movie_id;
$data['rating']     = '7';
$data['tracking_tag'] = 'title-maindetails';
$headers[]          = "Accept: */*";
$headers[]          = "Connection: Keep-Alive";
$headers[]          = "Content-Type: application/x-www-form-urlencoded";
$cookie_file_path   = dirname(__FILE__)."/cookies.txt";
$agent              = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";


/**
    Step 1: get login page and cookies
**/

$ch = curl_init(); 

// basic curl options for all requests
curl_setopt($ch, CURLOPT_HTTPHEADER,  $headers);
curl_setopt($ch, CURLOPT_HEADER,  0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);         
curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); 
// log
curl_setopt($ch, CURLOPT_VERBOSE, 1);
$verbose = fopen("loginfetch.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);

// set URL
curl_setopt($ch, CURLOPT_URL, $url_login);

// execute session to get cookies and required form inputs
$return = curl_exec($ch); 

// close connection
curl_close($ch);

//echo $return;

/** 
    Step 2: post login credentials
**/

// grab the hidden inputs from the form required to login
$fields = getFormFields($return);
$fields['login'] = $username;
$fields['password'] = $password;

// set postfields using what we extracted from the form
$postfields = http_build_query($fields); 

// post to login page
$ch = curl_init(); 

// set post options
curl_setopt($ch, CURLOPT_HTTPHEADER,  $headers);
curl_setopt($ch, CURLOPT_HEADER,  0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);         
curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); 
curl_setopt($ch, CURLOPT_POST, 1); 
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields); 
// log
curl_setopt($ch, CURLOPT_VERBOSE, 1);
$verbose = fopen("loginpost.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);

// set URL
curl_setopt($ch, CURLOPT_URL, $url_login);

// perform login
$return = curl_exec($ch);  

// close connection
curl_close($ch);

//echo $return; 

/**
    Step 3: get Auth string from movie page
**/

$ch = curl_init(); 

// basic curl options for all requests
curl_setopt($ch, CURLOPT_HTTPHEADER,  $headers);
curl_setopt($ch, CURLOPT_HEADER,  0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);         
curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); 
// log
curl_setopt($ch, CURLOPT_VERBOSE, 1);
$verbose = fopen("authfetch.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);

// set URL
curl_setopt($ch, CURLOPT_URL, $url_movie);

// execute session
$return_auth = curl_exec($ch); 

// close connection
curl_close($ch);

//echo $return_auth;

/**
    Step 4: submit rating
**/

$data['auth'] = getAuth($return_auth);

$post = http_build_query($data); 

// post to submit page
$ch = curl_init(); 

curl_setopt($ch, CURLOPT_HEADER,  0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);         
curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); 
curl_setopt($ch, CURLOPT_POST, 1);                                                                                            
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);                                                                  
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
// log
curl_setopt($ch, CURLOPT_VERBOSE, 1);
$verbose = fopen("ratingsubmit.txt", 'a+');
curl_setopt($ch, CURLOPT_STDERR, $verbose);

// set URL
curl_setopt($ch, CURLOPT_URL, $url_rating);

// execute session
$return = curl_exec($ch);

// close connection
curl_close($ch);

//echo $return;



function getFormFields($data)
{
    if (preg_match('/(<form method="post.*?<\/form>)/is', $data, $matches)) {
        $inputs = getInputs($matches[1]);

        return $inputs;
    } else {
        return('Login form not found.');
    }
}

function getInputs($form)
{
    $inputs = array();

    $elements = preg_match_all('/(<input[^>]+>)/is', $form, $matches);

    if ($elements > 0) {
        for($i = 0; $i < $elements; $i++) {
            $el = preg_replace('/\s{2,}/', ' ', $matches[1][$i]);

            if (preg_match('/name=(?:["\'])?([^"\'\s]*)/i', $el, $name)) {
                $name  = $name[1];
                $value = '';

                if (preg_match('/value=(?:["\'])?([^"\'\s]*)/i', $el, $value)) {
                    $value = $value[1];
                }

                $inputs[$name] = $value;
            }
        }
    }

    return $inputs;
}

// when submitting a rating to IMDb you also need to send an 'auth' string which we grab from the rating-list div on the movie details page
function getAuth($data)
{
    if (preg_match('/data-auth="(.*?)"/is', $data, $matches)) {
        $auth = $matches[1];

        return $auth;
    } else {
        return('Auth string not found.');
    }
}

答案 1 :(得分:0)

提交请求时,您的Cookie为空。每次执行curl_exec时都需要关闭卷曲手柄:

curl_close($ch);

这会将cookie存储到文件中。

因此,对于您的代码,您需要关闭它三次。确保在关闭后再次初始化卷曲,并确保每次都相应地指向cookie文件。