我正在尝试从LinkedIn公司或个人资料页面获取关注者的号码,姓名。
当前代码:
$test = 'https://www.linkedin.com/company/zareklamy';
function file_get_contents_curl_linkedin($url) {
if (!function_exists('curl_init')) {
return file_get_contents($url);
} elseif (!function_exists('file_get_contents')) {
return '';
}
if (empty($options)) {
$options = array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_IPRESOLVE => CURL_IPRESOLVE_V4,
CURLOPT_TIMEOUT => $timeout
);
}
if (empty($header)) {
$header = array(
"Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*\/*;q=0.5",
"Accept-Language: en-us,en;q=0.5",
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7",
"Cache-Control: must-revalidate, max-age=0",
"Connection: keep-alive",
"Keep-Alive: 300",
"Pragma: public"
);
}
if ($header != 'NO_HEADER') {
$options[CURLOPT_HTTPHEADER] = $header;
}
$ch = curl_init();
curl_setopt_array($ch, $options);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl_linkedin($test);
$html = htmlspecialchars($html);
echo $html;
上面显示的代码
<html><head> <script type="text/javascript"> window.onload = function() { // Parse the tracking code from cookies. var trk = "bf"; var trkInfo = "bf"; var cookies = document.cookie.split("; "); for (var i = 0; i < cookies.length; ++i) { if ((cookies[i].indexOf("trkCode=") == 0) && (cookies[i].length > 8)) { trk = cookies[i].substring(8); } else if ((cookies[i].indexOf("trkInfo=") == 0) && (cookies[i].length > 8)) { trkInfo = cookies[i].substring(8); } } if (window.location.protocol == "http:") { // If "sl" cookie is set, redirect to https. for (var i = 0; i < cookies.length; ++i) { if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) { window.location.href = "https:" + window.location.href.substring(window.location.protocol.length); return; } } } // Get the new domain. For international domains such as // fr.linkedin.com, we convert it to www.linkedin.com var domain = "www.linkedin.com"; if (domain != location.host) { var subdomainIndex = location.host.indexOf(".linkedin"); if (subdomainIndex != -1) { domain = "www" + location.host.substring(subdomainIndex); } } window.location.href = "https://" + domain + "/authwall?trk=" + trk + "&trkInfo=" + trkInfo + "&originalReferer=" + document.referrer.substr(0, 200) + "&sessionRedirect=" + encodeURIComponent(window.location.href); } </script> </head></html>`.
有window.location.href
,我无法使用file_get_contents_curl_linkedin
函数跳过它。有什么方法可以使用file_get_contents
创建虚假的COOKIES来获取LinkedIn页面的数据?
我如何解决下面的代码以从$test
字符串中获取完整的页面内容?
另外,我尝试使用PhantomJS跳过它,但没有帮助。