Question

我想在不使用API的情况下抓取公司的LinkedIn个人资料。使用以下代码时，我在抓取时获取页面重定向。

http://localnew/comapny被重定向到http://linkedin/company。如何防止它。

 <?php
error_reporting(E_ALL);
ini_set("display_errors", 1);
$cookie_file = "cookies.txt";
$url = 'https://www.linkedin.com/jobs/searchRefresh?keywords=Engineer&location=United%20States&locationId=us:0&refreshType=fullpage&trk=jobs_jserp_search_button_execute&searchOrigin=JSERP&applyLogin=';
$c = curl_init($url);
curl_setopt($c, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($c, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($c, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($c, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 0);
$z = curl_getinfo($c);
$s = curl_exec($c);
curl_close($c);
echo "<pre>";print_r($s);exit;

?>

Answer 1

你的问题很难理解。但我会尽我所能。

您案件的可能原因：

他们将您的请求检测为非真实人。大型网站阻止Spiders / Crawlers很常见。
使用的IP来自托管公司。通常这些都被列入黑名单。
未将检测到的请求检测为登录用户。有效的Cookie文件可以解决此问题。

我的建议是切换到真正的API。

使用php时如何避免使用url重定向？

1 个答案: