使用php curl和代理收集数据

时间:2018-09-05 06:10:15

标签: php curl web-scraping proxy

我正在使用以下脚本来抓取数据,当我访问php页面时,它仅适用于Google之类的少数网站,而对于其余网站,它将只是一个空白页面。代码有问题吗?以及如何调试它?

<?php

$request = curl_init("https://www.google.com");
curl_setopt($request, CURLOPT_RETURNTRANSFER, true);
curl_setopt($request, CURLOPT_HTTPHEADER, array(
'Content-type: application/json',
'Authorization: Bearer 31d15a'
));

$response = curl_exec($request);
echo $response;
curl_close($request);

1 个答案:

答案 0 :(得分:0)

还有更多选项可以设置,但是下面的内容可能足以满足您的任务。

//Initialise Curl
$ch = curl_init();

//set the url to be used
curl_setopt($ch, CURLOPT_URL, $url);

//follow HTTP 3xx redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

//automatically update the referer header
curl_setopt($ch, CURLOPT_AUTOREFERER, true);

//accept the responce after the execution
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

//don't verify the peer's SSL certificate
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

//set the browser
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

//executes given cURL session.
$html = curl_exec($ch);

//disable libxml errors
libxml_use_internal_errors(TRUE); 

//closes Curl session, & frees up the associated memory
curl_close($ch);