我使用fabpot / goutte 3.2,尝试使用此代码访问网站并且无法正常工作
$client = new \Goutte\Client();
$guzzleClient = new \GuzzleHttp\Client(array(
'curl' => array(
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_SSL_VERIFYHOST => false,
CURLOPT_SSL_VERIFYPEER => false),
));
$client->setClient($guzzleClient);
$crawler = $client->request('GET', "www.superpharm.pl/sklepy");
$crawler->filter('body')->each(function ($node) {
print $node->text() . "\n";
});
出现此错误:
In CurlFactory.php line 186:
[GuzzleHttp\Exception\ConnectException]
cURL error 7: Failed to connect to localhost port 80: Connection refused (s
ee http://curl.haxx.se/libcurl/c/libcurl-errors.html)
这是有效的:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "www.superpharm.pl/sklepy");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
$html = curl_exec($ch);
echo $html;
这也有效(没有goutte客户端):
$client = new \GuzzleHttp\Client();
$res = $client->request('GET', 'www.superpharm.pl/sklepy', ['verify' => false]);
echo $res->getBody();
任何人都知道为什么不与goutte合作?
答案 0 :(得分:1)
Goutte使用的客户端首先尝试根据$ uri参数获取绝对URI。因为您已从URI(即https://
)中省略了该方案,所以客户端会将其转换为:
http://localhost/www.superpharm.pl/sklepy
解决方案是简单地更改您的URI以包含如下所示的方案:
$crawler = $client->request('GET', "https://www.superpharm.pl/sklepy");