如何获取特定网址的完整html内容?

时间:2015-07-21 05:24:09

标签: php dom

我使用了几种方法在php中获取aptoide.com的html内容。

1)file_get_contents();

2)readfile();

3)curl as php function

function get_dataa($url) {
   $ch = curl_init($url);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
   curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
   curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Konqueror/4.0; Microsoft Windows) KHTML/4.0.80 (like Gecko)");
   $data = curl_exec($ch);
   curl_close($ch);
   return $data;
}

4)PHP Simple HTML DOM Parser

include_once('simple_html_dom.php');
$url="http://aptoide.com";
$html = file_get_html($url);

但是所有这些都为aptoide.com提供了空输出 有没有办法获得该网址的完整html内容?

2 个答案:

答案 0 :(得分:0)

使用你的curl get_dataa函数添加这一行:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

因为该页面正在重定向到www.aptide.com 全功能:

function get_dataa($url) {
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Konqueror/4.0; Microsoft Windows) KHTML/4.0.80 (like Gecko)");
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

答案 1 :(得分:0)

echo file_get_contents('http://www.aptoide.com/');对我来说很好。

因此aptoide.com可能会阻止您。如果你想改变你的IP(正如你在评论中所说的那样),你必须使用它:

$url = 'http://aptoide.com.com/';
$proxy = '127.0.0.1:9095'; // Your proxy
// $proxyauth = 'user:password'; // Proxy authentication if required

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
//curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyauth);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);

echo $curl_scraped_page;