用户信息在抓取时丢失

时间:2012-09-26 12:46:10

标签: php web-scraping

现在,我登录的网站在标题中显示我的用户名,表示我已登录。

现在,当我尝试抓取该网页并在我的m / c上显示结果时,页面标题显示“登录”,表示我需要登录。

我认为我在拼抢中遗漏了一些我需要考虑的cookie信息。

我有什么方法可以阅读cookies。

CURL代码:

function getString( $url ) {
    $ch = curl_init();
    curl_setopt( $ch, CURLOPT_URL, $url );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
    curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
    curl_setopt( $ch, CURLOPT_COOKIESESSION, true );
    curl_setopt( $ch, CURLOPT_COOKIEJAR, 'cookie.txt' );
    $response = curl_exec( $ch );
    curl_close( $ch );
    return $response;
}

1 个答案:

答案 0 :(得分:1)

由于Cookie路径的完整路径,您的代码无效,请确保cookie.txt可写尝试

var_dump(getString("http://google.com"));

    function getString($url) {
    $ch = curl_init();
    $cookie =  __DIR__ . '/cookie.txt' ;
    touch($cookie);

    if(!is_writable($cookie))
    {
        die("Can't write to cookie");
    }

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_COOKIESESSION, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_COOKIEJAR,$cookie);
    curl_setopt($ch, CURLOPT_COOKIEFILE,$cookie);
    $response = curl_exec($ch);
    curl_close($ch);
    return $response;
}

cookie.txt输出

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

.google.com TRUE    /   FALSE   1411737249  PREF    ID=ff7979720d6a1237:FF=0:TM=1348665249:LM=1348665249:S=bRYSIBSW9Cd7PKOr
#HttpOnly_.google.com   TRUE    /   FALSE   1364476449  NID 64=tcm3RUM8R_1ch9eD6tuFi4lObBjSNdxqwMHbpchYCQoUpghIjZbiNw8AdAm0buTAVF0SqUsZsYEs7PAWhJdhutO11EQ9y8iXwuQ9dsPmdWlt86BAa7hxRqQcjSoX9Bep
.google.com.ng  TRUE    /   FALSE   1411737252  PREF    ID=9428863ec2e741f5:FF=0:TM=1348665252:LM=1348665252:S=s7wtyWMM9OnRYoE4
#HttpOnly_.google.com.ng    TRUE    /   FALSE   1364476452  NID 64=Gyszb-4_10nzvSU6kGzBj5UQRTnB7purbAH0reBytKi_pn9m3R-0BXGBEjrkmMBmOYfFpfIQOYLaCgi5LfKOIcnPCrTpTpV9LVld-Xf9pq7U7W5QaZ63a_yHIG9Vmcir