我正在试图抓住一个实际阻止Bots的网站。
我在PHP cURL中使用此代码来消除阻塞。
$headers = array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding: zip, deflate, sdch'
, 'Accept-Language:en-US,en;q=0.8'
, 'Cache-Control:max-age=0',
'User-Agent:' . $user_agents[array_rand($user_agents)]
);
curl_setopt($curl_init, CURLOPT_URL, $url);
curl_setopt($curl_init, CURLOPT_HTTPHEADER, $headers);
$output = curl_exec($curl_init);
效果很好。
但我正在使用PHP Goutte,我想使用此库生成相同的请求
$headers2 = array(
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding' => 'zip, deflate, sdch'
, 'Accept-Language' => 'en-US,en;q=0.8'
, 'Cache-Control' => 'max-age=0',
'User-Agent' => $user_agents[array_rand($user_agents)]
);
$client = new Client();
foreach ($headers2 as $key => $v) {
$client->setHeader($key, $v);
}
$resp = $client->request('GET', $url);
echo $resp->html();
但是使用这段代码我被阻止了我正在抓取的网站。
我想知道如何使用Gouttee正确使用标题?
答案 0 :(得分:2)
您可以尝试查看Goutte
的结果吗?$status_code = $client->getResponse()->getStatus();
echo $status_code;
这是我在Guzzle取得成功的源代码 在index.php中
<?php
ini_set('display_errors', 1);
?>
<html>
<head><meta charset="utf-8" /></head>
<?php
$begin = microtime(true);
require 'vendor/autoload.php';
require 'helpers/helper.php';
$client = new GuzzleHttp\Client([
'base_uri' => 'http://www.yellowpages.com.au',
'cookies' => true,
'headers' => [
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding' => 'zip, deflate, sdch',
'Accept-Language' => 'en-US,en;q=0.8',
'Cache-Control' => 'max-age=0',
'User-Agent' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0'
]
]);
$helper = new Helper($client);
$mostViewed = $helper->getPageTest();
?>
<html>
在helper.php文件中
<?php
use GuzzleHttp\ClientInterface;
use Symfony\Component\DomCrawler\Crawler;
class Helper{
protected $client;
protected $totalPages;
public function __construct(ClientInterface $client){
$this->client = $client;
$this->totalPages = 3;
}
public function query()
{
$queries = array(
'clue' => 'Builders',
'locationClue' => 'Sydney%2C+2000',
'mappable' => 'true',
'selectedViewMode' => 'list'
);
// print_r($queries);
return $this->client->get('search/listings', array('query' => $queries));
}
public function getPageTest()
{
$responses = $this->query();
$html = $responses->getBody()->getContents();
echo $html;
exit();
}
}
?>
结果我得到了
希望这有用!!!