目前,我们发现我们的multicurl无法连接到代理时会发疯。一个失败的代理足以使curl使整个批次失败。
问题是,代理服务器上的一个超时可以关闭所有连接,这种情况发生在三个错误上:Connection timed out
,Proxy CONNECT aborted due to timeout
或SSL connection timeout
。
然后curl关闭所有连接,并且批处理失败。
预期的行为:仅将失败的请求返回为失败,其余请求成功完成并返回。
实际行为:一个失败的请求导致所有连接被关闭,因此不会返回所有成功的结果。在我下面介绍的情况下,只有一个结果作为成功返回
单个curl资源创建(Config只是对象保存配置数据):
public function create(Config $config)
{
$curlResource = curl_init($config->getUrl());
curl_setopt($curlResource, CURLOPT_TIMEOUT, $config->getTimeout());
curl_setopt($curlResource, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($curlResource, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlResource, CURLOPT_SSL_VERIFYPEER, $config->getSslCertificateValidation());
curl_setopt($curlResource, CURLOPT_SSL_VERIFYHOST, $config->getSslCertificateValidation() ? 2 : 0);
curl_setopt($curlResource, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curlResource, CURLOPT_HEADER, true);
curl_setopt($curlResource, CURLOPT_CUSTOMREQUEST, $config->getMethod());
curl_setopt($curlResource, CURLOPT_VERBOSE, true);
curl_setopt($curlResource, CURLOPT_USERAGENT, $config->getUserAgent());
curl_setopt($curlResource, CURLOPT_MAXREDIRS, $config->getMaxRedirects());
curl_setopt($curlResource, CURLOPT_HTTPHEADER, $this->processHeaders($config->getHeaders()));
$proxyConfig = $config->getProxyConfig();
curl_setopt($curlResource, CURLOPT_PROXY, $proxyConfig->getUrl());
curl_setopt($curlResource, CURLOPT_PROXYPORT, $proxyConfig->getPort());
curl_setopt($curlResource, CURLOPT_PROXYUSERPWD, $proxyConfig->getUsername().':'.$proxyConfig->getPassword());
return $curlResponse
}
多配置/客户端:
$curlResources = []; //array of resources made by create(Config $config);
$mh = curl_multi_init();
foreach ($curlResources as $curlResource) {
curl_multi_add_handle($mh, $curlResource);
}
$active = null;
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
// Wait for activity on any curl-connection
if (curl_multi_select($mh) == -1) {
usleep(1);
}
// Continue to exec until curl is ready to
// give us more data
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
foreach ($curlResources as $index => $curlResource) {
$errorMessage = curl_error($curlResource);
if (!$errorMessage) {
//process error
} else {
//process success
}
curl_multi_remove_handle($mh, $curlResource);
curl_close($curlResource);
}
curl_multi_close($mh);
这里是curl的详细输出:
Starting (PID: 23801)...
starting..
scrape limit test completed, continue..
getting proxy list..
proxy retrieved, continue to getting serp list..
* Trying 11.148.119.10...
* Trying 181.180.197.12...
* Trying 12.168.17.164...
* Trying 181.181.191.151...
* Trying 181.134.18.121...
* Trying 178.151.187.1...
* Trying 185.159.12.141...
* Trying 185.16.100.133...
* Trying 185.18.12.11...
* Connected to 11.148.119.10 (11.148.119.10) port 5000 (#0)
* Establish HTTP proxy tunnel to www.google.cz:443
* Proxy auth using Basic with user 'user4058'
> CONNECT www.google.cz:443 HTTP/1.1
Host: www.google.cz:443
Proxy-Authorization: Basic dXNlcjQwNTg6RnZRT29KMU5rcg==
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.7 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.7
Proxy-Connection: Keep-Alive
< HTTP/1.1 200 Connection established
<
* Proxy replied OK to CONNECT request
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* Connected to 181.180.197.12 (181.180.197.12) port 5000 (#1)
* Establish HTTP proxy tunnel to www.google.cz:443
* Proxy auth using Basic with user 'user1760'
> CONNECT www.google.cz:443 HTTP/1.1
Host: www.google.cz:443
Proxy-Authorization: Basic dXNlcjE3NjA6R3BMbHA4enJJaw==
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko
Proxy-Connection: Keep-Alive
< HTTP/1.1 200 Connection established
<
* Proxy replied OK to CONNECT request
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* Connected to 12.168.17.164 (12.168.17.164) port 5000 (#2)
* Establish HTTP proxy tunnel to www.google.cz:443
* Proxy auth using Basic with user 'user3193'
> CONNECT www.google.cz:443 HTTP/1.1
Host: www.google.cz:443
Proxy-Authorization: Basic dXNlcjMxOTM6R3BMbHA4enJJaw==
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.1 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.1
Proxy-Connection: Keep-Alive
< HTTP/1.1 200 Connection established
<
* Proxy replied OK to CONNECT request
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* Connected to 181.181.191.151 (181.181.191.151) port 5000 (#3)
* Establish HTTP proxy tunnel to www.google.cz:443
* Proxy auth using Basic with user 'user3085'
> CONNECT www.google.cz:443 HTTP/1.1
Host: www.google.cz:443
Proxy-Authorization: Basic dXNlcjMwODU6R3BMbHA4enJJaw==
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9
Proxy-Connection: Keep-Alive
< HTTP/1.1 200 Connection established
<
* Proxy replied OK to CONNECT request
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* Connected to 181.134.18.121 (181.134.18.121) port 3128 (#4)
* Establish HTTP proxy tunnel to www.google.cz:443
* Proxy auth using Basic with user 'clbaddr03810'
> CONNECT www.google.cz:443 HTTP/1.1
Host: www.google.cz:443
Proxy-Authorization: Basic Y2xiYWRkcjAzODEwOmNMaW0yMjUx
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586
Proxy-Connection: Keep-Alive
< HTTP/1.1 200 Connection established
<
* Proxy replied OK to CONNECT request
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* Connected to 178.151.187.1 (178.151.187.1) port 3128 (#5)
* Establish HTTP proxy tunnel to www.google.cz:443
* Proxy auth using Basic with user 'clbaddr02305'
> CONNECT www.google.cz:443 HTTP/1.1
Host: www.google.cz:443
Proxy-Authorization: Basic Y2xiYWRkcjAyMzA1OmNMaW0yMjUx
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/8.0.3 Safari/600.3.18
Proxy-Connection: Keep-Alive
< HTTP/1.1 200 Connection established
<
* Proxy replied OK to CONNECT request
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* Connected to 185.16.100.133 (185.16.100.133) port 5000 (#7)
* Establish HTTP proxy tunnel to www.google.cz:443
* Proxy auth using Basic with user 'user1927'
> CONNECT www.google.cz:443 HTTP/1.1
Host: www.google.cz:443
Proxy-Authorization: Basic dXNlcjE5Mjc6aWhxSXBNaGpyeg==
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586
Proxy-Connection: Keep-Alive
< HTTP/1.1 200 Connection established
<
* Proxy replied OK to CONNECT request
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* Connected to 185.18.12.11 (185.18.12.11) port 5000 (#8)
* Establish HTTP proxy tunnel to www.google.cz:443
* Proxy auth using Basic with user 'user1589'
> CONNECT www.google.cz:443 HTTP/1.1
Host: www.google.cz:443
Proxy-Authorization: Basic dXNlcjE1ODk6WVVjNXkyZ3hQVw==
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0
Proxy-Connection: Keep-Alive
* Proxy CONNECT aborted due to timeout
* Operation timed out after 0 milliseconds with 0 out of 0 bytes received
* Closing connection 0
* Operation timed out after 0 milliseconds with 0 out of 0 bytes received
* Closing connection 1
* Operation timed out after 0 milliseconds with 0 out of 0 bytes received
* Closing connection 2
* Operation timed out after 0 milliseconds with 0 out of 0 bytes received
* Closing connection 3
* Operation timed out after 0 milliseconds with 0 out of 0 bytes received
* Closing connection 4
* Operation timed out after 0 milliseconds with 0 out of 0 bytes received
* Closing connection 5
* Connection timed out after 5000 milliseconds
* Closing connection 6
* Operation timed out after 0 milliseconds with 0 out of 0 bytes received
* Closing connection 7
* Empty reply from server
* Connection #8 to host 185.18.12.11 left intact
处理后得到的结果:
serp list retrieved, processing errors
SERP LIST error:Operation timed out after 0 milliseconds with 0 out of 0 bytes received on 11.148.119.10
SERP LIST error:Operation timed out after 0 milliseconds with 0 out of 0 bytes received on 181.180.197.12
SERP LIST error:Operation timed out after 0 milliseconds with 0 out of 0 bytes received on 12.168.17.164
SERP LIST error:Operation timed out after 0 milliseconds with 0 out of 0 bytes received on 181.181.191.151
SERP LIST error:Operation timed out after 0 milliseconds with 0 out of 0 bytes received on 181.134.18.121
SERP LIST error:Operation timed out after 0 milliseconds with 0 out of 0 bytes received on 178.151.187.1
SERP LIST error:Connection timed out after 5000 milliseconds on 185.159.12.141
SERP LIST error:Operation timed out after 0 milliseconds with 0 out of 0 bytes received on 185.16.100.133
SERP LIST error:Proxy CONNECT aborted due to timeout on 185.18.12.11
有没有一种方法可以配置multicurl在这种情况下不会失败?