使用cURL和simple_html_dom

时间:2018-07-26 15:12:07

标签: php curl screen-scraping simple-html-dom

我遇到一个奇怪的问题。我有一个可以在localhost上正常运行的脚本,但是在服务器上运行该脚本后,它在几个循环后崩溃。该脚本使用cURLsimple_html_dom抓取网页。

这是代码的总和:

    class updateController extends Controller{
        function __construct(){
            ini_set('max_execution_time', 0);
            set_time_limit(0);
            require_once 'simple_html_dom.php';
        }
static public function ThemeforestLoopExisting(){
   $themes = Fulls::where('X','Y')->get();

   foreach($themes as $theme){
       $cURL = GeneralFunctions::cURLDom($theme['url']);
     //Here I search for specific parts on the web page using the "find" method on simple_html_dom
   }
}



  }

GeneralFunctions.php:

    static public function cURL_scraping($url){
        $curl = curl_init();
         curl_setopt($curl, CURLOPT_URL, $url);
         curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
         curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
         curl_setopt($curl, CURLOPT_MAXREDIRS, 10);
         curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
         curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A');
         curl_setopt($curl,CURLOPT_HTTPHEADER,array('Expect:'));
         curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, false );
         curl_setopt($curl, CURLOPT_ENCODING, 'identity');
        $response['str'] = curl_exec($curl);


        $response['header'] = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        curl_close($curl);
        return $response;
    }

    static public function cURLDom($url){

  $cURL_results   = generalFunctions::cURL_scraping($url);
  $res['header']  = $cURL_results['header'];
  $res['str']  = str_get_html($cURL_results['str'],$lowercase=false, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=false, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT);
  return $res['str'];
}

整个过程在最初的10/20/30左右运行,然后服务器崩溃。它在localhost上完美运行。 我跟我的虚拟主机交谈,但他们没有帮助。

在这里我有什么想念的东西吗? 任何帮助将非常感激... 谢谢!

1 个答案:

答案 0 :(得分:0)

这实际上是数据库问题。我将排序规则更改为utf8mb4_general_ci,并进行了修正。