我无法抓住个人资料

时间:2017-09-25 06:32:23

标签: php scraper

我发现了一个php脚本,可以从这里获取公司资料页面https://stackoverflow.com/questions/42329819/how-can-i-scrape-linkedin-company-pages-with-curl-and-php-no-csrf-token-found-i#= 我用自己的用户替换了UserAgenet。它成功地刮擦了公司资料页面,但是当我试图抓取个人资料页面时,它给了我以下错误,任何人都可以给我一个方向来解决它。

未经授权

您必须通过身份验证才能访问此页面。

这是脚本:

<?php
function fetch_value($str, $find_start = '', $find_end = '')
{
    if ($find_start == '')
    {
        return '';
    }
    $start = strpos($str, $find_start);
    if ($start === false)
    {
        return '';
    }
    $length = strlen($find_start);
    $substr = substr($str, $start + $length);
    if ($find_end == '')
    {
        return $substr;
    }
    $end = strpos($substr, $find_end);
    if ($end === false)
    {
        return $substr;
    }
    return substr($substr, 0, $end);
}

$linkedin_login_page = "https://www.linkedin.com/uas/login";
$linkedin_ref = "https://www.linkedin.com";

$username = 'sample@gmail.com';
$password = 'sample';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $linkedin_login_page);
curl_setopt($ch, CURLOPT_REFERER, $linkedin_ref);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.91 Safari/537.36');
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');

$login_content = curl_exec($ch);

if(curl_error($ch)) {
  echo 'error:' . curl_error($ch);
}

$var = array(
            'isJsEnabled' => 'false',
            'source_app' => '',
            'clickedSuggestion' => 'false',
            'session_key' => trim($username),
            'session_password' => trim($password),
            'signin' => 'Sign In',
            'session_redirect' => '',
            'trk' => '',
            'fromEmail' => '');
        $var['loginCsrfParam'] = fetch_value($login_content, 'type="hidden" name="loginCsrfParam" value="', '"');
        $var['csrfToken'] = fetch_value($login_content, 'type="hidden" name="csrfToken" value="', '"');
        $var['sourceAlias'] = fetch_value($login_content, 'input type="hidden" name="sourceAlias" value="', '"');

        $post_array = array();
        foreach ($var as $key => $value)
        {
            $post_array[] = urlencode($key) . '=' . urlencode($value); 
        }
        $post_string = implode('&', $post_array);

curl_setopt($ch, CURLOPT_URL, "https://www.linkedin.com/uas/login-submit");
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);

$store = curl_exec($ch);

if (stripos($store, "session_password-login-error") !== false)
{
    $err = trim(strip_tags(fetch_value($store, '<span class="error" id="session_password-login-error">', '</span>')));
    echo "Login error : ".$err;
}
elseif (stripos($store, 'profile-nav-item') !== false) 
{
        // curl_setopt($ch, CURLOPT_URL, 'https://www.linkedin.com/company/1113675/');
        curl_setopt($ch, CURLOPT_URL, 'https://www.linkedin.com/in/wcan01/');
        curl_setopt($ch, CURLOPT_POST, false);
        curl_setopt($ch, CURLOPT_POSTFIELDS, "");
        $content = curl_exec($ch);
        curl_close($ch);

        $file = fopen("result.txt", 'w+'); // Create a new file, or overwrite the existing one.
        fwrite($file, $content);
        fclose($file);
}
else
{
    echo "unknown error";
}

?>

1 个答案:

答案 0 :(得分:1)

我怀疑是否有可能刮伤个人,基于这样一个事实:在法官命令微软(仅在上个月)允许他们刮掉LinkedIn之前,它花了几个月的时间并花费了巨额法律费用。的公开资料。

Here是一篇关于裁决的文章,另一篇关于the complicated legal arguments