使用goutte进行多代理迭代 - laravel

时间:2017-03-14 19:23:00

标签: arrays proxy cron laravel-5.4 goutte

我有10个代理的数组,并设置2000个关键字来刮取数据。所以我需要传递每个代理IP的20个关键字,每小时这个我已经设置了每小时的基础,即每小时200个关键字,10个随机数组.proxy中的代理不应该在任何时间重复。我使用下面的代码,但它没有给出确切的结果。

$proxy = [
                    '0' => 'x.x.x.x:80',
                    '1' => 'x.x.x.x:80',
                    '2' => 'x.x.x.x:80',
                    '3' => 'x.x.x.x:80',
                    '4' => 'x.x.x.x:80',
                    '5' => 'x.x.x.x:80',
                    '6' => 'x.x.x.x:80',
                    '7' => 'x.x.x.x:80',
                    '8' => 'x.x.x.x:80',
                    '9' => 'x.x.x.x:80'
                ];

    foreach($proxy as $prox){

         $website[] = $prox[mt_rand(0, count($prox) - 1)];

         $keyword = keyword::take(20)->get();

         foreach($keyword as $list){


                //scraping code
                $client = new Client($website);

                $client->setAuth('xxx', 'xxx', 'basic');

                $client->setHeader('user-agent', "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/6.0.472.63 Safari/534.3");
                $crawler = $client->request('GET', $url);
                $status = $client->getResponse()->getStatus();

                //store the data in database 
         }


    }

任何建议都将不胜感激!

0 个答案:

没有答案