防止cURL

时间:2019-07-13 08:11:45

标签: php security web-scraping cron server-administration

我已配置allow_url_fopen=0以防止报废工具。配置是在全局模式下完成的,我不允许覆盖本地php.ini文件。但是,我注意到,如果抓取工具基于cURL,则可以绕过该标志。查看下面的给定页面复印机功能,我使用给定功能从配置allow_url_fopen=0的服务器成功复制了页面。

public function handle()
{
    try{
        if( ini_get('allow_url_fopen') ) {
            Log::info('Flag allow_url_fopen is enabled');
            $html = new Htmldom('page_url_here');
        } else {
            Log::info('Flag allow_url_fopen is disabled trying with cURL');
            $webpage = EventCron::get_web_page('page_url_here');
            $html = new Htmldom($webpage['content']);
        }
        /*Doing some magical stuff with the site content */
        $agenda = $html->find('div.articles' , 0);

        Log::info('success');
    }catch(\Exception $e){
        Log::error('Event Cron Error '.$e->getMessage());
    }
}

public static function get_web_page( $url, $cookiesIn = '' ){
    $options = array(
        CURLOPT_RETURNTRANSFER => true,     
        CURLOPT_HEADER         => true,    
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_ENCODING       => "",
        CURLOPT_AUTOREFERER    => true,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT        => 120,
        CURLOPT_MAXREDIRS      => 10,
        CURLINFO_HEADER_OUT    => true,
        CURLOPT_SSL_VERIFYPEER => true,
        CURLOPT_HTTP_VERSION   => CURL_HTTP_VERSION_1_1,
        CURLOPT_COOKIE         => $cookiesIn
    );

    $ch = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $rough_content = curl_exec( $ch );
    $err = curl_errno( $ch );
    $errmsg = curl_error( $ch );
    $header = curl_getinfo( $ch );
    curl_close( $ch );

    $header_content = substr($rough_content, 0, $header['header_size']);
    $body_content = trim(str_replace($header_content, '', $rough_content));
    $pattern = "#Set-Cookie:\\s+(?<cookie>[^=]+=[^;]+)#m"; 
    preg_match_all($pattern, $header_content, $matches); 
    $cookiesOut = implode("; ", $matches['cookie']);

    $page['errno'] = $err;
    $page['errmsg'] = $errmsg;
    $page['headers'] = $header_content;
    $page['content'] = $body_content;
    $page['cookies'] = $cookiesOut;
    return $page;
}

现在的问题是,如何防止页面被破坏/报废?如果没有这种事情允许我们这样做,可能是PHP中的一个安全问题。我找到了一种替代方法,可以通过禁用cURL库来防止这种情况的发生,但这不是正确的解决方案。我的一些托管项目需要使用cURL库,因为它是最常用的库,并且在Web开发人员中很流行。

0 个答案:

没有答案