我使用了几种方法在php中获取aptoide.com的html内容。
1)file_get_contents();
2)readfile();
3)curl as php function
function get_dataa($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Konqueror/4.0; Microsoft Windows) KHTML/4.0.80 (like Gecko)");
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
4)PHP Simple HTML DOM Parser
include_once('simple_html_dom.php');
$url="http://aptoide.com";
$html = file_get_html($url);
但是所有这些都为aptoide.com提供了空输出 有没有办法获得该网址的完整html内容?
答案 0 :(得分:0)
使用你的curl get_dataa函数添加这一行:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
因为该页面正在重定向到www.aptide.com 全功能:
function get_dataa($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Konqueror/4.0; Microsoft Windows) KHTML/4.0.80 (like Gecko)");
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
答案 1 :(得分:0)
echo file_get_contents('http://www.aptoide.com/');
对我来说很好。
因此aptoide.com
可能会阻止您。如果你想改变你的IP(正如你在评论中所说的那样),你必须使用它:
$url = 'http://aptoide.com.com/';
$proxy = '127.0.0.1:9095'; // Your proxy
// $proxyauth = 'user:password'; // Proxy authentication if required
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
//curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyauth);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;