请告诉我,使用multi_curl发送请求是否有任何限制。 当我尝试发送超过200的请求时,它正在超时。
见下面的代码.............. .........................................
foreach($newUrlArry as $url){
$gatherUrl[] = $url['url'];
}
/*...................Array slice----------------------*/
$totalUrlRequest = count($gatherUrl);
if($totalUrlRequest > 10){
$offset = 10;
$index = 0;
$matchedAnchors = array();
$dom = new DOMDocument;
$NoOfTilesRequest = ceil($totalUrlRequest/$offset);
for($sl = 0; $sl<$NoOfTilesRequest;$sl++){
$output = array_slice($gatherUrl, $index, $offset);
$index = $offset+$index;
$responseAction = $this->multiRequestAction($output);
$k=0;
foreach($responseAction as $responseHtml){
@$dom->loadHTML($responseHtml);
$documentLinks = $dom->getElementsByTagName("a");
$chieldFlag = false;
for($i=0;$i<$documentLinks->length;$i++) {
$documentLink = $documentLinks->item($i);
if ($documentLink->hasAttribute('href') AND substr($documentLink->getAttribute('href'), 0, strlen($match)) == $match) {
$description = $documentLink->childNodes;
foreach($description as $words) {
$name = trim($words->nodeName);
if($name == 'em' || $name == 'b' || $name=="span" || $name=="p") {
if(!empty($words->nodeValue)) {
$matchedAnchors[$sl][$k]['anchor'] = trim($words->nodeValue);
$matchedAnchors[$sl][$k]['img'] = 0;
if($documentLink->hasAttribute('rel'))
$matchedAnchors[$sl][$k]['rel'] = 'Y';
else
$matchedAnchors[$sl][$k]['rel'] = 'N';
$chieldFlag = true;
break;
}
}
elseif($name == 'img' ) {
$alt= $words->getAttribute('alt');
if(!empty($alt)) {
$matchedAnchors[$sl][$k]['anchor'] = trim($words->getAttribute('alt'));
$matchedAnchors[$sl][$k]['img'] = 1;
if($documentLink->hasAttribute('rel'))
$matchedAnchors[$sl][$k]['rel'] = 'Y';
else
$matchedAnchors[$sl][$k]['rel'] = 'N';
$chieldFlag = true;
break;
}
}
}
if(!$chieldFlag){
$matchedAnchors[$sl][$k]['anchor'] = $documentLink->nodeValue;
$matchedAnchors[$sl][$k]['img'] = 0;
if($documentLink->hasAttribute('rel'))
$matchedAnchors[$sl][$k]['rel'] = 'Y';
else
$matchedAnchors[$sl][$k]['rel'] = 'N';
}
}
}$k++;
}
}
}
答案 0 :(得分:4)
@Phliplip&amp; @lunixbochs提到了常见的cURL陷阱(最大执行时间和目标服务器拒绝。)
当向同一台服务器发送那么多cURL请求时,我会尝试“很好”并自愿放置睡眠时间,这样我就不会轰炸主机。对于低端站点,1000多个请求可能就像一个迷你DDOS!
这里的代码对我有用。我曾经从旧网站上抓取客户的产品数据,因为数据被锁定在具有 NO 导出功能的专有数据库系统中。
<?php
header('Content-type: text/html; charset=utf-8', true);
set_time_limit(0);
$urls = array(
'http://www.example.com/cgi-bin/product?id=500',
'http://www.example.com/cgi-bin/product?id=501',
'http://www.example.com/cgi-bin/product?id=502',
'http://www.example.com/cgi-bin/product?id=503',
'http://www.example.com/cgi-bin/product?id=504',
);
$i = 0;
foreach($urls as $url){
echo $url."\n";
$curl = curl_init($url);
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
curl_setopt($curl, CURLOPT_USERAGENT, $userAgent);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt($curl, CURLOPT_TIMEOUT, 25 );
$html = curl_exec($curl);
$html = @mb_convert_encoding($html, 'HTML-ENTITIES', 'utf-8');
curl_close($curl);
// now do something with info returned by curl
$i++;
if($i%10==0){
sleep(20);
} else {
sleep(2);
}
}
?>
主要特点是:
根据我的经验,睡觉()会阻止服务器拒绝你。 但是,如果通过“不同的服务器”表示您发送少量服务器的请求数量很多,例如:
$urls = array(
'http://www.example-one.com/',
'http://www.example-two.com/',
'http://www.example-three.com/',
'http://www.example-four.com/',
'http://www.example-five.com/',
'http://www.example-six.com/'
);
您正在使用set_time_limit(0);
然后出现错误,可能导致您的代码die;
尝试
ini_set('display_errors',1);
error_reporting(E_ALL);
告诉我们您收到的错误消息。
答案 1 :(得分:1)
PHP不会对使用curl_multi_init
的连接数施加限制,但内存使用和时间限制将成为问题。
检查php.ini中的memory_limit
设置并尝试增加它以查看是否对您有帮助。