php curl to instagram返回奇怪的结果

时间:2017-01-09 11:58:15

标签: php curl instagram

include_once('simple_html_dom.php'); 

    $usuario = "username";
    $password = "password";

    $url = 'https://www.instagram.com/';
    $url_login = 'https://www.instagram.com/accounts/login/ajax/';
    $user_agent = array("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 ",
                  "(KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36");

    $ch = curl_init(); 

    $headers = [
    'Accept-Encoding: gzip, deflate',
    'Accept-Language: en-US;q=0.6,en;q=0.4',
    'Connection: keep-alive',
    'Content-Length: 0',
    'Host: www.instagram.com',
    'Origin: https://www.instagram.com',
    'Referer: https://www.instagram.com/',
    'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36', 
    'X-Instagram-AJAX: 1',
    'X-Requested-With: XMLHttpRequest'  
    ];

    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_URL, $url);

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
    curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookie/pruebalogininsta2.txt");
    curl_setopt($ch, CURLOPT_REFERER, $sTarget);
    curl_setopt($ch, CURLOPT_HEADER, TRUE);

    $html = curl_exec($ch);

    preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $html, $matches);
    $cookies = array();
    foreach($matches[1] as $item) {
        parse_str($item, $cookie);
        $cookies = array_merge($cookies, $cookie);
    }


    $headers = [
    'Accept-Encoding: gzip, deflate',
    //'Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4',
    'Accept-Language: en-US;q=0.6,en;q=0.4',
    'Connection: keep-alive',
    'Content-Length: 0',
    'Host: www.instagram.com',
    'Origin: https://www.instagram.com',
    'Referer: https://www.instagram.com/',
    'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36', 
    'X-Instagram-AJAX: 1',
    'X-Requested-With: XMLHttpRequest'
    ];

    $cadena_agregar_vector = 'X-CSRFToken:'. $cookies["csrftoken"];

    $headers[] = $cadena_agregar_vector ;

    $sPost =  "username=".$usuario . "&password=". $password ;

    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $sPost);
    curl_setopt($ch, CURLOPT_URL, $url_login);  

    $html2 = curl_exec($ch);

    curl_setopt($ch, CURLOPT_URL, "http://www.instagram.com/");  

    $html4 = curl_exec($ch);

    echo $html4;

这是我得到的enter image description here

2 个答案:

答案 0 :(得分:1)

问题在于你硬编码Accept-Encoding: gzip, deflate的方式,这使得curl确实发送了编码头,但它没有打开curl的解码功能,因此你得到了原始数据,而没有为你解释它

删除'Accept-Encoding: gzip, deflate',然后添加curl_setopt($ch, CURLOPT_ENCODING, 'gzip, deflate');,curl将为您解码(前提是curl是使用gzip& deflate支持编译的) - 或者更好的是,只需执行curl_setopt($ch, CURLOPT_ENCODING, ''); ,curl将自动列出所有支持的编码,因此你不会遇到编译问题,其中curl不是用gzip支持编译的。

在不相关的说明中,您可能想要使用CURLOPT_USERAGENT,而不是手动设置用户代理标头。否则,UA字符串将仅与此1请求一起发送,并在下一个请求时重置,而CURLOPT_USERAGENT将一直保留到curl_close($ ch)

编辑:在我对这篇文章的第一次修订中,我写了CURLOPT_POSTFIELDS而不是CURLOPT_ENCODING,抱歉,修正了

编辑2:在另一个不相关的注释中,您将用户名/密码编码错误。而不是$sPost = "username=".$usuario . "&password=". $password ;,做 $sPost=http_build_query(array('username'=>$usuario,'password'=>$password));,其他帐号为&或=或密码或用户名中的NULL无法正常工作

答案 1 :(得分:1)

@hanshenrik发布的答案应该被接受。但是,如果您只想要一个有效且不正确的简单解决方案,请从标头数组中删除'Accept-Encoding: gzip, deflate'