变量$ link中提到的链接重定向到产品页面。当我在浏览器中调用/打开它时它工作正常。但它在PHP file_get_contents函数中不起作用。
我的代码:
$url = "750651";
$link = "http://www.costco.com/CatalogSearch?storeId=10301&catalogId=10701&langId=-1&keyword=$url";
$link = str_replace('&','&',$link);
$res = file_get_contents(html_entity_decode(urldecode($link)));
错误
Warning: file_get_contents(http://www.costco.com/CatalogSearch?storeId=10301&catalogId=10701&langId=-1&keyword=750651): failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden
如何防止在file_get_contents函数中将&
转换为&
我也尝试过以下代码,但没有成功
$link = "http://www.costco.com/CatalogSearch?";
$options = array("storeId"=>"10301","catalogId"=>"10701","langId"=>"-1","keyword"=>$url);
$link .= http_build_query($options,'','&');
$res = file_get_contents($link);
答案 0 :(得分:1)
我还发现了替代功能。我希望这会充分使用。
function get_fcontent( $url, $javascript_loop = 0, $timeout = 5 ) {
$url = str_replace( "&", "&", urldecode(trim($url)) );
$cookie = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $ch, CURLOPT_ENCODING, "" );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false ); # required for https urls
curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, $timeout );
curl_setopt( $ch, CURLOPT_TIMEOUT, $timeout );
curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
$content = curl_exec( $ch );
$response = curl_getinfo( $ch );
curl_close ( $ch );
if ($response['http_code'] == 301 || $response['http_code'] == 302) {
ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");
if ( $headers = get_headers($response['url']) ) {
foreach( $headers as $value ) {
if ( substr( strtolower($value), 0, 9 ) == "location:" )
return get_url( trim( substr( $value, 9, strlen($value) ) ) );
}
}
}
if ( ( preg_match("/>[[:space:]]+window\.location\.replace\('(.*)'\)/i", $content, $value) || preg_match("/>[[:space:]]+window\.location\=\"(.*)\"/i", $content, $value) ) && $javascript_loop < 5) {
return get_url( $value[1], $javascript_loop+1 );
} else {
return array( $content, $response );
}
}
查看结果
$lurl=get_fcontent($link);
echo $lurl[0];
答案 1 :(得分:1)
我用这种方式:
$myURL = 'http://www.costco.com/CatalogSearch?';
$options = array("storedId"=>$10301,"câtlogId"=>10701,"langId"=>-1,"keyword"=>$url);
$myURL .= http_build_query($options,'','&');
$myData = file_get_contents("$myURL");
它运行良好。试试这个。
答案 2 :(得分:0)
尝试不使用urldecode和entity_decode,或者在string_replacement
之前执行$link = "http://www.costco.com/CatalogSearch?storeId=10301&catalogId=10701&langId=-1&keyword=$url";
$link = str_ireplace('&','&', html_entity_decode(urldecode($link)));
$res = file_get_contents($link);
由于同源政策,您无法通过file_get_contents获取每个网站。站点所有者应该打开Access-Control-Allow-Origin以使其正常工作。但是,您可以通过CURL下载该网站,如下所示:
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $link);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // if you want to follow redirects
$data = curl_exec($ch);
curl_close($ch);