PHP脚本停止在共享主机上运行

时间:2018-07-11 00:35:24

标签: php curl web-scraping timeout simple-html-dom

我真的想在这里和Google上找到每个问题的答案,但找不到任何答案。

我的php代码只是在代码中间停止运行,并且每次运行代码时都在不同的时间停止运行。我不认为CURL函数有问题,因为代码有时会在CURL函数调用之前或之后停止。我认为这不是代码错误,因为代码在运行时可以正确运行。我想这是共享主机的“超时”问题。

我的代码基本上是通过simple_html_dom库和curl函数来进行“网络报废”的。我在共享的虚拟主机(hostgator)上运行它,我也尝试通过CRON JOB运行它,但是它也无法正常工作。

我已经在代码的开头设置了变量(并且还更改了PHP.INI上的变量),但没有用:

ignore_user_abort(true);
set_time_limit(0);
ini_set('max_execution_time', 0);
ini_set('memory_limit',-1);

完整代码(我缩短了一点,在原始代码中,我放置了一些不同的日期并多次调用了“ scrap”函数):

    require('simple_html_dom.php');


    //get today's date
    $today = date('Y-m-d');

    if (date('H') < '9') {

        $date_period = "today";
        $date_period_date = date('Y-m-d');

        $puDay = date('j');
        $puMonth = date('n');
        $puYear = date('Y');
        $doDay = date('j', strtotime(' + 1 days')); 
        $doMonth = date('n', strtotime(' + 1 days')); 
        $doYear = date('Y', strtotime(' + 1 days'));

        scrap($puDay,$puMonth,$puYear,$doDay,$doMonth,$doYear,$date_period, $today, $date_period_date, $location_id,$location,$city);


        unset($date_period,$date_period_date,$puDay,$puMonth,$puYear,$doDay,$doMonth,$doYear);


        }


    //functions

    function scrap($puDay_aux, $puMonth_aux, $puYear_aux, $doDay_aux, $doMonth_aux, $doYear_aux, $period_id_aux, $curDate_aux, $periodDate_aux, $location_id_aux,$location_aux,$city_aux){

      $bad_proxy = "";

        $check = 1;

        do{

            $link = "my link";


            $best_proxy = get_best_proxy($link, $bad_proxy);

            $scraped_page = curl($link, $best_proxy);
            $html = new simple_html_dom();
            $html->load($scraped_page);
            $check_end = strpos($html,'</html>');

            if(!empty($html)) {

                if ($check_end===FALSE) {

                  $check = $check + 1;
                  $bad_proxy = $best_proxy;

                } else {


                    foreach($html->find('table[class=ResultRow]') as $element) 
                    {
                        $supplier = $element->find('h4',0);

                            unset($supplier,$supplier_aux,$car,$car_aux,$price,$price_aux,$priceBRL);

                    }

                    $html->clear();
                    unset($link,$html,$best_proxy,$stream,$context);


                    $check = 5;


                    } 


            } else {

                    $check = $check + 1;    

            }


        } while ($check<5);

    }



function get_best_proxy($link, $bad_proxy){


        $proxy_array = array(
    '177.184.144.130:8080',
    '177.6.147.202:8080',
    '187.44.1.167:8080',
    '170.82.228.42:8080',
    '177.72.1.102:8080',
    '138.185.101.20:8080',
    '187.102.149.178:8080',
    '177.32.12.127:8080',
    '189.38.3.9:8080',
    '138.185.101.21:8080'                      
    );

        $i=0;

        foreach ($proxy_array as $key){

          if ($key != $bad_proxy) {

              $proxy_speed = proxy_speed($key, $link);
              $proxy_speed_result[$i] = $proxy_speed;

              if ($proxy_speed<9999999){break;}

              $i++;

          }

        }

        $min = array_keys($proxy_speed_result, min($proxy_speed_result));

        $min_aux = $min[0];


        $proxy_output = $proxy_array[$min_aux];

        return($proxy_output);


    }




    function proxy_speed($proxy, $link) {

          $link = "my link here";
                $loadingtime = time();
                $theHeader = curl_init($link);
                curl_setopt($theHeader, CURLOPT_RETURNTRANSFER, 1);
          curl_setopt($theHeader, CURLOPT_FOLLOWLOCATION, 1);
          curl_setopt($theHeader, CURLOPT_AUTOREFERER, 1);
          curl_setopt($theHeader, CURLOPT_MAXREDIRS, 10);
          curl_setopt($theHeader, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8");
          curl_setopt($theHeader, CURLOPT_HTTPPROXYTUNNEL, 1);
          curl_setopt($theHeader, CURLOPT_SSL_VERIFYPEER, 0);
          curl_setopt($theHeader, CURLOPT_SSL_VERIFYHOST, 0);
          curl_setopt($theHeader, CURLOPT_CONNECTTIMEOUT, 10);      
          curl_setopt($theHeader, CURLOPT_TIMEOUT, 10);
          curl_setopt($theHeader, CURLOPT_PROXY, $proxy); 
          $curlResponse = curl_exec($theHeader);

                if ($curlResponse === false) 
                {
                    return 9999999;
                } 
                else 
                {

                    return (time() - $loadingtime);
                }

    }




    function curl($url, $proxy) {




            $options = Array(
                CURLOPT_RETURNTRANSFER => TRUE,  // Setting cURL's option to return the webpage data
                CURLOPT_FOLLOWLOCATION => TRUE,  // Setting cURL to follow 'location' HTTP headers
                CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
                CURLOPT_CONNECTTIMEOUT => 300,   // Setting the amount of time (in seconds) before the request times out
                CURLOPT_TIMEOUT => 300,  // Setting the maximum amount of time for cURL to execute queries
                CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
                CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",  // Setting the useragent
                CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
                CURLOPT_HTTPPROXYTUNNEL => 1,
                CURLOPT_SSL_VERIFYPEER => false,
                CURLOPT_SSL_VERIFYHOST => false,
                CURLOPT_PROXY => $proxy
                    );

            $ch = curl_init();  // Initialising cURL 
            $httpCode = curl_getinfo($ch , CURLINFO_HTTP_CODE);
            curl_setopt_array($ch, $options);   // Setting cURL's options using the previously assigned array data in $options
            $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable

            if ($data === false) $data = curl_error($ch);
                return stripslashes($data);
            curl_close($ch);
        }

有人知道这里发生了什么吗?我的虚拟主机是否超时? 谢谢!

0 个答案:

没有答案