Question

我正在使用以下脚本来抓取数据，当我访问php页面时，它仅适用于Google之类的少数网站，而对于其余网站，它将只是一个空白页面。代码有问题吗？以及如何调试它？

<?php

$request = curl_init("https://www.google.com");
curl_setopt($request, CURLOPT_RETURNTRANSFER, true);
curl_setopt($request, CURLOPT_HTTPHEADER, array(
'Content-type: application/json',
'Authorization: Bearer 31d15a'
));

$response = curl_exec($request);
echo $response;
curl_close($request);

Answer 1

还有更多选项可以设置，但是下面的内容可能足以满足您的任务。

//Initialise Curl
$ch = curl_init();

//set the url to be used
curl_setopt($ch, CURLOPT_URL, $url);

//follow HTTP 3xx redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

//automatically update the referer header
curl_setopt($ch, CURLOPT_AUTOREFERER, true);

//accept the responce after the execution
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

//don't verify the peer's SSL certificate
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

//set the browser
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

//executes given cURL session.
$html = curl_exec($ch);

//disable libxml errors
libxml_use_internal_errors(TRUE); 

//closes Curl session, & frees up the associated memory
curl_close($ch);

使用php curl和代理收集数据

1 个答案: