在Goutte中设置代理

时间:2016-03-04 21:52:09

标签: php web-scraping goutte

我尝试使用Guzzle的文档来设置代理,但它不起作用。 Goutte的官方Github页面已经死了,所以在那里找不到任何东西。

任何人都知道如何设置代理?

这是我尝试过的:

$client = new Client();
$client->setHeader('User-Agent', $user_agent);
$crawler = $client->request('GET', $request, ['proxy' => $proxy]);

5 个答案:

答案 0 :(得分:1)

你在思考rigth,但在Goutte \ Client :: doRequest()中,当创建Guzzle客户端

$guzzleRequest = $this->getClient()->createRequest(
        $request->getMethod(),
        $request->getUri(),
        $headers,
        $body
);
创建请求对象时不传递

选项。因此,如果要使用代理,则覆盖类Goutte \ Client,方法doRequest(),并替换此代码

$guzzleRequest = $this->getClient()->createRequest(
        $request->getMethod(),
        $request->getUri(),
        $headers,
        $body,
        $request->getParameters()
);

覆盖类的示例:

<?php

namespace igancev\override;

class Client extends \Goutte\Client
{
    protected function doRequest($request)
    {
        $headers = array();
        foreach ($request->getServer() as $key => $val) {
            $key = implode('-', array_map('ucfirst', explode('-', strtolower(str_replace(array('_', 'HTTP-'), array('-', ''), $key)))));
            if (!isset($headers[$key])) {
                $headers[$key] = $val;
            }
        }

        $body = null;
        if (!in_array($request->getMethod(), array('GET','HEAD'))) {
            if (null !== $request->getContent()) {
                $body = $request->getContent();
            } else {
                $body = $request->getParameters();
            }
        }

        $guzzleRequest = $this->getClient()->createRequest(
            $request->getMethod(),
            $request->getUri(),
            $headers,
            $body,
            $request->getParameters()
        );

        foreach ($this->headers as $name => $value) {
            $guzzleRequest->setHeader($name, $value);
        }

        if ($this->auth !== null) {
            $guzzleRequest->setAuth(
                $this->auth['user'],
                $this->auth['password'],
                $this->auth['type']
            );
        }

        foreach ($this->getCookieJar()->allRawValues($request->getUri()) as $name => $value) {
            $guzzleRequest->addCookie($name, $value);
        }

        if ('POST' == $request->getMethod() || 'PUT' == $request->getMethod()) {
            $this->addPostFiles($guzzleRequest, $request->getFiles());
        }

        $guzzleRequest->getParams()->set('redirect.disable', true);
        $curlOptions = $guzzleRequest->getCurlOptions();

        if (!$curlOptions->hasKey(CURLOPT_TIMEOUT)) {
            $curlOptions->set(CURLOPT_TIMEOUT, 30);
        }

        // Let BrowserKit handle redirects
        try {
            $response = $guzzleRequest->send();
        } catch (CurlException $e) {
            if (!strpos($e->getMessage(), 'redirects')) {
                throw $e;
            }

            $response = $e->getResponse();
        } catch (BadResponseException $e) {
            $response = $e->getResponse();
        }

        return $this->createResponse($response);
    }
}

尝试发送请求

$client = new \igancev\override\Client();
$proxy = 'http://149.56.85.17:8080'; // free proxy example
$crawler = $client->request('GET', $request, ['proxy' => $proxy]);

答案 1 :(得分:0)

您可以设置自定义GuzzleClient并将其分配给Goutte客户端。 当Guzzle通过Goutte发出请求时,使用默认配置。该配置是通过Guzzle构造传递的。

$guzzle = new \GuzzleHttp\Client(['proxy' => 'http://192.168.1.1:8080']);
$goutte = new \Goutte\Client();
$goutte->setClient($guzzle);
$crawler = $goutte->request($method, $url);

答案 2 :(得分:0)

您可以直接在Goutte或Guzzle Request中使用

$proxy = 'xx.xx.xx.xx:xxxx';

$goutte = new GoutteClient();
echo $goutte->request('GET', 'https://example.com/', ['proxy' => $proxy])->html();

在Guzzle中使用相同的方法

$Guzzle = new Client();
$GuzzleResponse = $Guzzle->request('GET', 'https://example.com/', ['proxy' => $proxy]);

答案 3 :(得分:0)

我已经解决了这个问题=>

    $url = 'https://api.myip.com';
    $client = new \Goutte\Client;
    $client->setClient(new \GuzzleHttp\Client(['proxy' => 'http://xx.xx.xx.xx:8080']));
    $get_html = $client->request('GET', $url)->html();
    var_dump($get_html);

答案 4 :(得分:0)

对于最新版本,请使用:

Goutte客户端实例(扩展了Symfony \ Component \ BrowserKit \ HttpBrowser)

use Symfony\Component\HttpClient\HttpClient;
use Goutte\Client;

$client = new Client(HttpClient::create(['proxy' => 'http://xx.xx.xx.xx:80']));
...