下面是我用来提取任何网站标题的示例代码:
function fread_url($url,$ref="")
{
if(function_exists("curl_init")){
$ch = curl_init();
$user_agent = "googlebot";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt( $ch, CURLOPT_HTTPGET, 1 );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION , 1 );
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$html = curl_exec($ch);
curl_close($ch);
}
else{
$html.= file_get_contents($urweb);
}
return $html;
}
////////////////////////////////////
$doc = new DOMDocument(); @$doc->loadHTML(@fread_url($urweb));
$titlelist = $doc->getElementsByTagName("title");
if($titlelist->length > 0){ $wbtitle = $titlelist->item(0)->nodeValue; }
echo $wbtitle;
我的问题是如何修改此脚本以访问网站5秒钟,如果没有可用的标题,则返回ampty?现在对于某些网站来说,提取标题甚至花费更多时间需要5秒钟。
答案 0 :(得分:3)
为cURL设置超时。
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
看起来你正试图用CURLOPT_CONNECTTIMEOUT
来做,但那是
尝试连接时等待的秒数
而CURLOPT_TIMEOUT
超时是
允许cURL函数执行的最大秒数。
答案 1 :(得分:0)
您可以完全重写该功能,如下所示。如果需要保留fread_url()函数,也可以创建另一个函数。
function get_page_title($url, $ref = "") {
if (function_exists("curl_init")) {
$ch = curl_init();
$user_agent = "googlebot";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt( $ch, CURLOPT_HTTPGET, 1 );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION , 1 );
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$html = curl_exec($ch);
curl_close($ch);
} else {
$html = file_get_contents($urweb);
}
if ($html === false || empty($html))
return false;
$doc = new DOMDocument();
@$doc->loadHTML($html);
$titlelist = $doc->getElementsByTagName("title");
return $titlelist->length > 0 ? $titlelist->item(0)->nodeValue : '';
}
$wbtitle = get_page_title($urlweb);