如何从带有感叹号的URL中检索数据?

时间:2016-01-05 20:39:21

标签: php

我想检索一个网站的内容,但该网站是在网址中使用感叹号构建的,这似乎不起作用。

我尝试的事情:

<?php
echo file_get_contents('https://domain.com/path/!weird.formatted?url=1');
echo file_get_contents('https://domain.com/path/%21weird.formatted?url=1');
echo file_get_contents(urlencode('https://domain.com/path/!weird.formatted?url=1'));
echo file_get_contents(rawurlencode('https://domain.com/path/!weird.formatted?url=1'));

我还尝试使用PHP Curl检索内容,但这里似乎感叹号也是一个问题。

那我该如何检索这个网页呢?任何建议都会非常感激。

更新

我尝试从以下位置检索内容的网址: https://loket.bunnik.nl/mozard/!suite86.scherm0325?mPag=1070

1 个答案:

答案 0 :(得分:2)

所以问题是网页正在检查有效的用户代理/ cookie。我用来解决问题的代码:

<?php
    echo getPage("https://loket.bunnik.nl/mozard/!suite86.scherm0325?mPag=1070");

    function getPage ($url) {


    $useragent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36';
    $timeout= 120;
    $dir            = dirname(__FILE__);
    $cookie_file    = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . '.txt';

    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_FAILONERROR, true);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true );
    curl_setopt($ch, CURLOPT_ENCODING, "" );
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
    curl_setopt($ch, CURLOPT_AUTOREFERER, true );
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout );
    curl_setopt($ch, CURLOPT_TIMEOUT, $timeout );
    curl_setopt($ch, CURLOPT_MAXREDIRS, 10 );
    curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
    curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com/');
    $content = curl_exec($ch);
    if(curl_errno($ch))
    {
        echo 'error:' . curl_error($ch);
    }
    else
    {
        return $content;        
    }
        curl_close($ch);

    }
?>