如何使用cURL和PHP抓取LinkedIn公司页面?

时间:2017-11-22 09:09:18

标签: php curl

我想用cURL和PHP抓取一些LinkedIn公司页面,并使用登录凭据。我试过这段代码。但是我得到了像

这样的错误
  

未经授权
  您必须通过身份验证才能访问此页面。

在抓取公司页面之前,我必须通过cURL在LinkedIn上使用个人帐户登录,但似乎无效。

而不是使用我们在fetch_value上面使用的simple_html_dom

function fetch_value($str, $find_start = '', $find_end = '') {
    if ($find_start == '') {
        return '';
    }
    $start = strpos($str, $find_start);
    if ($start === false) {
        return '';
    }
    $length = strlen($find_start);
    $substr = substr($str, $start + $length);
    if ($find_end == '') {
        return $substr;
    }
    $end = strpos($substr, $find_end);
    if ($end === false) {
        return $substr;
    }
    return substr($substr, 0, $end);
}

$linkedin_login_page = "https://www.linkedin.com/uas/login";
$linkedin_ref = "https://www.linkedin.com";
$username = 'username';
$password = 'password';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $linkedin_login_page);
curl_setopt($ch, CURLOPT_REFERER, $linkedin_ref);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7)');
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
$login_content = curl_exec($ch);
if (curl_error($ch)) {
  echo 'error:' . curl_error($ch);
}
$var = array(
    'isJsEnabled' => 'false',
    'source_app' => '',
    'clickedSuggestion' => 'false',
    'session_key' => trim($username),
    'session_password' => trim($password),
    'signin' => 'Sign In',
    'session_redirect' => '',
    'trk' => '',
    'fromEmail' => ''
);
$var['loginCsrfParam'] = fetch_value($login_content, 'type="hidden" name="loginCsrfParam" value="', '"');
$var['csrfToken'] = fetch_value($login_content, 'type="hidden" name="csrfToken" value="', '"');
$var['sourceAlias'] = fetch_value($login_content, 'input type="hidden" name="sourceAlias" value="', '"');
$post_array = array();
foreach ($var as $key => $value) {
    $post_array[] = urlencode($key) . '=' . urlencode($value);
}
$post_string = implode('&', $post_array);
curl_setopt($ch, CURLOPT_URL, "https://www.linkedin.com/uas/login-submit");
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);
$store = curl_exec($ch);
if (stripos($store, "session_password-login-error") !== false) {
    $err = trim(strip_tags(fetch_value($store, '<span class="error" id="session_password-login-error">', '</span>')));
    echo "Login error : ".$err;
} elseif (stripos($store, 'profile-nav-item') !== false) {
    curl_setopt($ch, CURLOPT_URL, 'https://www.linkedin.com/company-beta/10667/?pathWildcard=10667');
    curl_setopt($ch, CURLOPT_POST, false);
    curl_setopt($ch, CURLOPT_POSTFIELDS, "");
    $content = curl_exec($ch);
    curl_close($ch);
    echo $content;
} else {
    echo "unknown error";
}

有任何建议请帮忙吗?

谢谢!

0 个答案:

没有答案