卷曲`effective_url`没有得到最后一个有效的网址

时间:2016-08-12 12:47:30

标签: php curl

我需要为此网站获取curl的最后一个有效网址: http://www.wechat.com/en/ 当我使用curl' curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);时,我得到http://www.wechat.com/,但如果您访问该网站,则可以通过在末尾添加/ en /来查看网址更改。

代码:

 $url = 'http://www.wechat.com';
 $curl = curl_init($url);
 curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
 curl_setopt ($curl,CURLOPT_VERBOSE, false);
 curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 0);
 curl_setopt($curl, CURLOPT_TIMEOUT, 5);
 curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
 curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
 $ip = curl_getinfo($curl, CURLINFO_PRIMARY_IP);
 $last_effective_url = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);
 curl_close($curl);

 echo $last_effective_url;

我在这里忘记了。帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

这里的问题是,CURL可以解释的/en/网址没有HTTP重定向代码。 CURL看到HTTP200并停在那里。

相反,页面会被浏览器中的Javascript刷新,而CURL无法解释该页面。

因此,CURL的有效网址是' http://www.wechat.com',这是正确的行为。

查看页面的来源http://www.wechat.com/(浏览器中的view-source:http://www.wechat.com/index.html)以查看正在运行的JavaScript:

 <script>
        (function () {
            Array.prototype.contain = function (f) {
                var e = this.length;
                while (e--) {
                    if (this[e] === f) {
                        return true
                    }
                }
                return false
            };
            var c = ["en", "zh", "pt", "th", "vi", "id", "es", "ru", "ar", "he", "pl", "hi", "ja", "it", "ko", "ms", "tr"], a = navigator.language || navigator.userLanguage || 'en', a = a.replace(/-\w+/, ""), d = location.pathname.match(/\/(\w+)\/(\w*)/i), b = c.contain(a) ? a : "en", b = b == "zh" ? "zh_TW" : b;
            if (location.pathname == "/index.html" || location.pathname == "/" || location.pathname == "/cgi-bin/readtemplate") {
                location.href = location.protocol + "//" + location.host + "/" + b + "/";
                return
            }
            if (d && d[1]) {
                return
            }
            location.href = location.href.replace(/\w+\/(\w+)\//i, function (e, f) {
                return e.replace(f, b)
            })
        })();
    </script>