我有10个代理的数组,并设置2000个关键字来刮取数据。所以我需要传递每个代理IP的20个关键字,每小时这个我已经设置了每小时的基础,即每小时200个关键字,10个随机数组.proxy中的代理不应该在任何时间重复。我使用下面的代码,但它没有给出确切的结果。
$proxy = [
'0' => 'x.x.x.x:80',
'1' => 'x.x.x.x:80',
'2' => 'x.x.x.x:80',
'3' => 'x.x.x.x:80',
'4' => 'x.x.x.x:80',
'5' => 'x.x.x.x:80',
'6' => 'x.x.x.x:80',
'7' => 'x.x.x.x:80',
'8' => 'x.x.x.x:80',
'9' => 'x.x.x.x:80'
];
foreach($proxy as $prox){
$website[] = $prox[mt_rand(0, count($prox) - 1)];
$keyword = keyword::take(20)->get();
foreach($keyword as $list){
//scraping code
$client = new Client($website);
$client->setAuth('xxx', 'xxx', 'basic');
$client->setHeader('user-agent', "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/6.0.472.63 Safari/534.3");
$crawler = $client->request('GET', $url);
$status = $client->getResponse()->getStatus();
//store the data in database
}
}
任何建议都将不胜感激!