php登录curl代码无法按预期工作

时间:2018-03-12 23:18:11

标签: php curl

我正在尝试使用php中的curl函数登录特定页面。请检查以下代码。我在banggood.com上用我的电子邮件和密码连接,然后我想重定向到另一个私人页面,但它没有按预期工作。我没有错。我使用下面的代码重定向到此页面(https://www.banggood.com/index.php?com=account)。登录后,我想访问我的订单存在的私人页面。任何帮助表示赞赏。

//The username or email address of the account.
define('EMAIL', 'aaa@gmail.com');

//The password of the account.
define('PASSWORD', 'mypassword');

//Set a user agent. This basically tells the server that we are using Chrome ;)
define('USER_AGENT', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2309.372 Safari/537.36');

//Where our cookie information will be stored (needed for authentication).
define('COOKIE_FILE', 'cookie.txt');

//URL of the login form.
define('LOGIN_FORM_URL', 'https://www.banggood.com/login.html');

//Login action URL. Sometimes, this is the same URL as the login form.
define('LOGIN_ACTION_URL', 'https://www.banggood.com/login.html');


//An associative array that represents the required form fields.
//You will need to change the keys / index names to match the name of the form
//fields.
$postValues = array(
    'email' => EMAIL,
    'password' => PASSWORD
);

//Initiate cURL.
$curl = curl_init();

//Set the URL that we want to send our POST request to. In this
//case, it's the action URL of the login form.
curl_setopt($curl, CURLOPT_URL, LOGIN_ACTION_URL);

//Tell cURL that we want to carry out a POST request.
curl_setopt($curl, CURLOPT_POST, true);

//Set our post fields / date (from the array above).
curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($postValues));

//We don't want any HTTPS errors.
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

//Where our cookie details are saved. This is typically required
//for authentication, as the session ID is usually saved in the cookie file.
curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE);

//Sets the user agent. Some websites will attempt to block bot user agents.
//Hence the reason I gave it a Chrome user agent.
curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT);

//Tells cURL to return the output once the request has been executed.
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

//Allows us to set the referer header. In this particular case, we are
//fooling the server into thinking that we were referred by the login form.
curl_setopt($curl, CURLOPT_REFERER, LOGIN_FORM_URL);

//Do we want to follow any redirects?
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false);

//Execute the login request.
curl_exec($curl);

//Check for errors!
if(curl_errno($curl)){
    throw new Exception(curl_error($curl));
}

//We should be logged in by now. Let's attempt to access a password protected page
curl_setopt($curl, CURLOPT_URL, 'https://www.banggood.com/index.php?com=account&t=ordersList');

//Use the same cookie file.
curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE);

//Use the same user agent, just in case it is used by the server for session validation.
curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT);

//We don't want any HTTPS / SSL errors.
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

//Execute the GET request and print out the result.
curl_exec($curl);

2 个答案:

答案 0 :(得分:20)

你做错了几件事:

  1. 您在进行Cookie会话之前尝试登录,但该网站要求您在发送登录请求之前进行Cookie会话。

  2. 您的Cookie会话绑定了一个CSRF令牌,此处称为at,您需要从登录页面html解析并提供您的登录请求,您的代码不会#39; t fetch。

  3. 最重要的是,您的Cookie会话中存在一个需要获取和解决的验证码图像,以及您需要附加到登录请求的文本,您的代码完全忽略了该请求。

  4. 您的登录请求需要标头x-requested-with: XMLHttpRequest - 但您的代码并未添加该标头。

  5. 您的登录请求需要POST数据中的字段com=accountt=submitLogin字段,但您的代码并未添加其中任何一个(您尝试将其添加到您的网址中) ,但他们不应该在网址中,他们应该在POST数据中,也就是你的$ postValues数组,而不是url)

  6. 这是您需要做的事情:

    • 首先对登录页面执行正常的GET请求。这将为您提供会话cookie ID,CSRF令牌以及验证码图像的URL。
    • 存储Cookie ID并确保为其提供所有进一步的请求,然后解析出csrf令牌(它在html中看起来像<input type="hidden" name="at" value="5aabxxx5dcac0" />),以及验证码图像的URL(每个cookie会话都不同,所以不要对它进行硬编码。
    • 然后获取验证码图片,解决它,并将它们全部添加到您的登录请求的POST数据,以及用户名,密码,验证码答案,comt,以及将http标头x-requested-with: XMLHttpRequest添加到登录请求中,将其发送到https://www.banggood.com/login.html,然后您应该登录!

    这是一个使用hhb_curl进行Web请求的示例实现(它是一个curl_包装器,用于处理cookie,将静态curl_错误转换为RuntimeExceptions等),DOMDocument用于解析CSRF令牌和deathbycaptcha.com的api打破验证码。

    Ps:示例代码在第6行和第7行提供真实信用的deathbycaptcha.com api用户名/密码之前不会起作用,此外,验证码看起来非常简单,我认为打破它可以实现自动化如果你有充分的动力,我不是。 - 编辑,似乎他们改进了他们的验证码,因为我写的,现在看起来非常困难。此外,banggood帐户只是一个临时测试帐户,它没有受到损害,这显然发生在我在这里发布用户名/密码)

    <?php
    
    declare(strict_types = 1);
    require_once ('hhb_.inc.php');
    $banggood_username = 'igcpilojhkfhtdz@my10minutemail.com';
    $banggood_password = 'igcpilojhkfhtdz@my10minutemail.com';
    $deathbycaptcha_username = '?';
    $deathbycaptcha_password = '?';
    
    $hc = new hhb_curl ( '', true );
    $html = $hc->exec ( 'https://www.banggood.com/login.html' )->getStdOut ();
    $domd = @DOMDocument::loadHTML ( $html );
    $xp = new DOMXPath ( $domd );
    $csrf_token = $xp->query ( '//input[@name="at"]' )->item ( 0 )->getAttribute ( "value" );
    $captcha_image_url = 'https://www.banggood.com/' . $domd->getElementById ( "get_login_image" )->getAttribute ( "src" );
    $captcha_image = $hc->exec ( $captcha_image_url )->getStdOut ();
    
    $captcha_answer = deathbycaptcha ( $captcha_image, $deathbycaptcha_username, $deathbycaptcha_password );
    
    $html = $hc->setopt_array ( array (
            CURLOPT_POST => 1,
            CURLOPT_POSTFIELDS => http_build_query ( array (
                    'com' => 'account',
                    't' => 'submitlogin',
                    'email' => $banggood_username,
                    'pwd' => $banggood_password,
                    'at' => $csrf_token,
                    'login_image_code' => $captcha_answer 
            ) ),
            CURLOPT_HTTPHEADER => array (
                    'x-requested-with: XMLHttpRequest' 
            ) 
    ) )->exec ()->getStdOut ();
    var_dump ( // $hc->getStdErr (),
    $html );
    
    function deathbycaptcha(string $imageBinary, string $apiUsername, string $apiPassword): string {
        $hc = new hhb_curl ( '', true );
        $response = $hc->setopt_array ( array (
                CURLOPT_URL => 'http://api.dbcapi.me/api/captcha',
                CURLOPT_POST => 1,
                CURLOPT_HTTPHEADER => array (
                        'Accept: application/json' 
                ),
                CURLOPT_POSTFIELDS => array (
                        'username' => $apiUsername,
                        'password' => $apiPassword,
                        'captchafile' => 'base64:' . base64_encode ( $imageBinary )  // use base64 because CURLFile requires a file, and i cba with tmpfile() .. but it would save bandwidth.
                ),
                CURLOPT_FOLLOWLOCATION => 0 
        ) )->exec ()->getStdOut ();
        $response_code = $hc->getinfo ( CURLINFO_HTTP_CODE );
        if ($response_code !== 303) {
            // some error
            $err = "DeathByCaptcha api retuned \"$response_code\", expected 303, ";
            switch ($response_code) {
                case 403 :
                    $err .= " the api username/password was rejected";
                    break;
                case 400 :
                    $err .= " we sent an invalid request to the api (maybe the API specs has been updated?)";
                    break;
                case 500 :
                    $err .= " the api had an internal server error";
                    break;
                case 503 :
                    $err .= " api is temorarily unreachable, try again later";
                    break;
                default :
                    {
                        $err .= " unknown error";
                        break;
                    }
            }
            $err .= ' - ' . $response;
            throw new \RuntimeException ( $err );
        }
        $response = json_decode ( $response, true );
        if (! empty ( $response ['text'] ) && $response ['text'] !== '?') {
            return $response ['text']; // sometimes the answer might be available right away.
        }
        $id = $response ['captcha'];
        $url = 'http://api.dbcapi.me/api/captcha/' . urlencode ( $id );
        while ( true ) {
            sleep ( 10 ); // check every 10 seconds
            $response = $hc->setopt ( CURLOPT_HTTPHEADER, array (
                    'Accept: application/json' 
            ) )->exec ( $url )->getStdOut ();
            $response = json_decode ( $response, true );
            if (! empty ( $response ['text'] ) && $response ['text'] !== '?') {
                return $response ['text'];
            }
        }
    }
    

答案 1 :(得分:-2)

CURLOPT_FOLLOWLOCATION设置为1或true,您可能还需要CURLOPT_AUTOREFERER而不是静态参考。

您是否在COOKIEJAR(cookie.txt)中获得了一些Cookie?请记住,该文件必须已存在且PHP需要写入权限。

如果您在localhost上执行PHP,那么网络嗅探器工具可以帮助调试问题,尝试使用Wireshark或一些等效的软件。因为请求可能仍然会遗漏一些重要的HTTP标头,如Host