我从客户那里得到命令,要求使用php curl抓取一个网站。我做了工作,脚本在我的本地主机上运行良好。但是,当我将其提供给客户端脚本时,该脚本无法在其本地主机上运行。
<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);
print "Cascading https://www.autotrader.ca/cars/on/toronto/?rcp=15&rcs=0&prx=100&prv=Ontario&loc=toronto%2C%20on&hprc=True&wcp=True&sts=New-Used&inMarket=basicSearch&mdl=Accent&make=Hyundai&scrladid=11543266:<p>";
$array = [];
$array[] = "/a/hyundai/accent/oshawa/ontario/19_11543266_/?showcpo=ShowCpo&ncse=no&orup=1_15_340&sprx=100";
$array[] = "/a/hyundai/accent/cambridge/ontario/5_48590586_20200220145456261/?showcpo=ShowCpo&ncse=no&orup=2_15_340&sprx=100";
$array[] = "/a/hyundai/accent/mississauga/ontario/19_11536424_/?showcpo=ShowCpo&ncse=no&orup=3_15_340&sprx=100";
foreach ($array as $key=>$value)
{
$scrape = "https://www.autotrader.ca".$array[$key];
print "Scraping $scrape<p>";
echo "<br>";
$user_agent = 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Mobile Safari/537.36';
$headers = [
'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding: gzip, deflate, br',
'accept-language: en-US,en;q=0.9',
'cache-control: max-age=0',
'sec-fetch-dest: document',
'sec-fetch-mode: navigate',
'sec-fetch-site: none',
'sec-fetch-user: ?1',
'upgrade-insecure-requests: 1',
'user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Mobile Safari/537.36',
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $scrape);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 100);
curl_setopt($ch, CURLOPT_ENCODING, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_VERBOSE, true);
// curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Length: 0'));
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . '/cookie.txt');
$contents = curl_exec($ch);
if ($contents === FALSE){
echo "Error : ".curl_error($ch);
echo "<br>";
print "contents returned for $key = FALSE<br>";
}
curl_close($ch);
// echo $contents;
$start_pos = strpos($contents, "<title>", 0);
$end_pos = strpos($contents, "</title>", 0);
$title = substr($contents, $start_pos+7, $end_pos-$start_pos);
print "Listing $key: $title<p>";
echo "<br>";
echo "<br>";
}
他还告诉自己,他在未使用curl之前而是在使用其他方法之前正在抓取网站,并且他认为他们已将其请求限制在服务器上,但请注意,他仍然可以在浏览器中访问该网站。我检查了他是否将curl替换为google url,从而能够获得正确的响应。
答案 0 :(得分:0)
这里最可能的问题是您的客户端安装的PHP没有安装或启用php-curl扩展。取决于您的操作系统和PHP的安装方式,可以用不同的方法实现,但这是一些常见的情况:
对于 Ubuntu 或其他基于Debian的Linux 发行版:
apt-get install php7.4-curl
systemctl restart apache2
用第一个命令中当前使用的PHP版本替换'7.4'
对于Windows上的WAMP : How to enable curl in Wamp server
对于Windows上的XAMPP : How to enable cURL in PHP / XAMPP
答案 1 :(得分:0)
在代理后面运行它,工作正常。简化并纠正了一些小错误。
尝试此操作,不要忘记注释/编辑CURLOPT_PROXY行。
<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);
$array = [
"/a/hyundai/accent/oshawa/ontario/19_11543266_/?showcpo=ShowCpo&ncse=no&orup=1_15_340&sprx=100",
"/a/hyundai/accent/cambridge/ontario/5_48590586_20200220145456261/?showcpo=ShowCpo&ncse=no&orup=2_15_340&sprx=100",
"/a/hyundai/accent/mississauga/ontario/19_11536424_/?showcpo=ShowCpo&ncse=no&orup=3_15_340&sprx=100"
];
foreach ($array as $key => $value) {
$scrape = "https://www.autotrader.ca" . $value;
echo "Scraping " . $scrape . "<br>\n";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $scrape);
curl_setopt($ch, CURLOPT_PROXY, "http://<proxy_url>:80"); // Comment if not behind a proxy
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . '/cookie.txt');
$contents = curl_exec($ch);
if (curl_error($ch)) {
echo "Error : " . curl_error($ch) . "<br>\n";
break;
}
curl_close($ch);
$title = explode("<title>", $contents);
$title = explode("</title>", $title[1]);
$title = $title[0];
echo "Listing " . $key . ": " . $title . "<br>\n";
echo "<br>\n";
echo "<br>\n";
}