我想要远程网站的源代码。所以我用过:
<?php
include_once('simple_html_dom.php');
$f = file_get_contents("http://163.53.77.55");
echo htmlspecialchars( $f );
我得到了源代码......但现在我想要源代码:
$f = file_get_contents("http://163.53.77.55/offers/");
我收到了这个错误:
警告:file_get_contents(http://163.53.77.55/offers):无法打开流:HTTP请求失败!
中的HTTP / 1.1 500服务器错误
这意味着我可以看到stackoverflow.com的源代码,但无法看到stackoverflow.com/questions /!
答案 0 :(得分:1)
你必须使用卷曲。但首先关闭JavaScript,看看你需要的信息是否存在。例如,商品页面通过JavaScript获取图片。
本页的设计者试图劝阻你。
当您使用curl时,请使用旧的智能手机用户代理。
这有效:
$request = array();
$request[] = "Host: www.flipkart.com";
$request[] = "Connection: keep-alive";
$request[] = "Cache-Control: no-cache";
$request[] = "Pragma: no-cache";
$request[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$request[] = "User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0";
$request[] = "Accept-Language: en-US,en;q=0.5";
$ch = curl_init('http://www.flipkart.com/offers/');
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_FILETIME, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 100);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_TIMEOUT,100);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
$data = curl_exec($ch);
if (curl_errno($ch)){
$data .= 'Retreive Base Page Error: ' . curl_error($ch);
}
else {
$skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE));
$head = substr($data,0,$skip);
$data = substr($data,$skip);
}
echo $data;