我想检索一个网站的内容,但该网站是在网址中使用感叹号构建的,这似乎不起作用。
我尝试的事情:
<?php
echo file_get_contents('https://domain.com/path/!weird.formatted?url=1');
echo file_get_contents('https://domain.com/path/%21weird.formatted?url=1');
echo file_get_contents(urlencode('https://domain.com/path/!weird.formatted?url=1'));
echo file_get_contents(rawurlencode('https://domain.com/path/!weird.formatted?url=1'));
我还尝试使用PHP Curl检索内容,但这里似乎感叹号也是一个问题。
那我该如何检索这个网页呢?任何建议都会非常感激。
更新
我尝试从以下位置检索内容的网址: https://loket.bunnik.nl/mozard/!suite86.scherm0325?mPag=1070
答案 0 :(得分:2)
所以问题是网页正在检查有效的用户代理/ cookie。我用来解决问题的代码:
<?php
echo getPage("https://loket.bunnik.nl/mozard/!suite86.scherm0325?mPag=1070");
function getPage ($url) {
$useragent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36';
$timeout= 120;
$dir = dirname(__FILE__);
$cookie_file = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . '.txt';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($ch, CURLOPT_ENCODING, "" );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch, CURLOPT_AUTOREFERER, true );
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout );
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout );
curl_setopt($ch, CURLOPT_MAXREDIRS, 10 );
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com/');
$content = curl_exec($ch);
if(curl_errno($ch))
{
echo 'error:' . curl_error($ch);
}
else
{
return $content;
}
curl_close($ch);
}
?>