我需要为此网站获取curl的最后一个有效网址:
http://www.wechat.com/en/
当我使用curl' curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);
时,我得到http://www.wechat.com/
,但如果您访问该网站,则可以通过在末尾添加/ en /来查看网址更改。
代码:
$url = 'http://www.wechat.com';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt ($curl,CURLOPT_VERBOSE, false);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($curl, CURLOPT_TIMEOUT, 5);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
$ip = curl_getinfo($curl, CURLINFO_PRIMARY_IP);
$last_effective_url = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);
curl_close($curl);
echo $last_effective_url;
我在这里忘记了。帮助将不胜感激。
答案 0 :(得分:0)
这里的问题是,CURL可以解释的/en/
网址没有HTTP重定向代码。 CURL看到HTTP200并停在那里。
相反,页面会被浏览器中的Javascript刷新,而CURL无法解释该页面。
因此,CURL的有效网址是' http://www.wechat.com',这是正确的行为。
查看页面的来源http://www.wechat.com/(浏览器中的view-source:http://www.wechat.com/index.html)以查看正在运行的JavaScript:
<script>
(function () {
Array.prototype.contain = function (f) {
var e = this.length;
while (e--) {
if (this[e] === f) {
return true
}
}
return false
};
var c = ["en", "zh", "pt", "th", "vi", "id", "es", "ru", "ar", "he", "pl", "hi", "ja", "it", "ko", "ms", "tr"], a = navigator.language || navigator.userLanguage || 'en', a = a.replace(/-\w+/, ""), d = location.pathname.match(/\/(\w+)\/(\w*)/i), b = c.contain(a) ? a : "en", b = b == "zh" ? "zh_TW" : b;
if (location.pathname == "/index.html" || location.pathname == "/" || location.pathname == "/cgi-bin/readtemplate") {
location.href = location.protocol + "//" + location.host + "/" + b + "/";
return
}
if (d && d[1]) {
return
}
location.href = location.href.replace(/\w+\/(\w+)\//i, function (e, f) {
return e.replace(f, b)
})
})();
</script>