如何优化此代码以提取标题

时间:2011-05-17 21:30:31

标签: php

下面是我用来提取任何网站标题的示例代码:

function fread_url($url,$ref="")
    {
        if(function_exists("curl_init")){
            $ch = curl_init();
            $user_agent = "googlebot";
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
            curl_setopt( $ch, CURLOPT_HTTPGET, 1 );
            curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
            curl_setopt( $ch, CURLOPT_FOLLOWLOCATION , 1 );
            curl_setopt( $ch, CURLOPT_URL, $url );
            curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
            $html = curl_exec($ch);
            curl_close($ch);
        }
        else{
            $html.= file_get_contents($urweb);
               }
        return $html;
    }
////////////////////////////////////
$doc = new DOMDocument(); @$doc->loadHTML(@fread_url($urweb));  
$titlelist = $doc->getElementsByTagName("title"); 
if($titlelist->length > 0){   $wbtitle = $titlelist->item(0)->nodeValue; } 
echo $wbtitle;

我的问题是如何修改此脚本以访问网站5秒钟,如果没有可用的标题,则返回ampty?现在对于某些网站来说,提取标题甚至花费更多时间需要5秒钟。

2 个答案:

答案 0 :(得分:3)

为cURL设置超时。

curl_setopt($ch, CURLOPT_TIMEOUT, 5);

看起来你正试图用CURLOPT_CONNECTTIMEOUT来做,但那是

  

尝试连接时等待的秒数

CURLOPT_TIMEOUT超时是

  

允许cURL函数执行的最大秒数。

http://php.net/manual/en/function.curl-setopt.php

答案 1 :(得分:0)

您可以完全重写该功能,如下所示。如果需要保留fread_url()函数,也可以创建另一个函数。

function get_page_title($url, $ref = "") {
    if (function_exists("curl_init")) {
        $ch = curl_init();
        $user_agent = "googlebot";
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
        curl_setopt( $ch, CURLOPT_HTTPGET, 1 );
        curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
        curl_setopt( $ch, CURLOPT_FOLLOWLOCATION , 1 );
        curl_setopt( $ch, CURLOPT_URL, $url );
        curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
        $html = curl_exec($ch);
        curl_close($ch);
    } else {
        $html = file_get_contents($urweb);
    }

    if ($html === false || empty($html))
        return false;

    $doc = new DOMDocument();
    @$doc->loadHTML($html);  
    $titlelist = $doc->getElementsByTagName("title"); 

    return $titlelist->length > 0 ? $titlelist->item(0)->nodeValue : '';
}

$wbtitle = get_page_title($urlweb);