我正在开发一个涉及使用cURL或file_get_contents获取页面的项目。问题是,当我尝试回显所提取的html时,输出似乎与原始页面不同,并非所有图像都显示出来。请问我想知道是否有解决方案。我的代码
<?php
//Get the url
$url = "http://www.google.com";
//Get the html of url
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
//$userAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.X.Y.Z Safari/525.13.";
$userAgent = "IE 7 – Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)";
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents($url);
echo $html;
?>
由于
答案 0 :(得分:8)
您应该使用<base>
为所有相关链接指定基本网址:
如果您卷曲http://example.com/thisPage.html
,请在回显的''输出中添加base
标记。从技术上讲,这应该在<head>
中,但这可行:
echo '<base href="http://example.com/" />';
echo $html;
答案 1 :(得分:1)
使用它
//Get the html of url
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
//$userAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.X.Y.Z Safari/525.13.";
$userAgent = "IE 7 – Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)";
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$parse = parse_url($url);
$count = "http://".$parse['host'].dirname($parse['path'])."//";
$page = str_replace("<head>", "<head>\n<base href=\"" . $count . "\" />", $page);
$page = str_replace("<HEAD>", "<head>\n<base href=\"" . $count . "\" />", $page);
echo $page;
?>