我试图获取rss,由于某种原因我得到了错误的数据:
$url = "http://rss.news.yahoo.com/rss/oddlyenough";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$xml = curl_exec($ch);
curl_close($ch);
echo htmlentities($xml, ENT_QUOTES, "UTF-8");
输出:
<!-- rc2.ops.ch1.yahoo.com uncompressed/chunked Sun Nov 25 15:57:06 UTC 2012 -->
如果我尝试以其他方式加载此数据,我会获得正确的数据。例如,这个有效:
$xml = simplexml_load_file('http://rss.news.yahoo.com/rss/oddlyenough');
print "<ul>\n";
foreach ($xml->channel->item as $item){
print "<li>$item->title</li>\n";
}
print "</ul>";
请告诉我使用curl的代码有什么问题?
答案 0 :(得分:2)
你正在对抗Location
障碍。
添加此选项:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
以便:
$url = "http://rss.news.yahoo.com/rss/oddlyenough";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$xml = curl_exec($ch);
curl_close($ch);
echo htmlentities($xml, ENT_QUOTES, "UTF-8");
当您运行上述代码时,您从Yahoo!收到的第一个答案是:
HTTP/1.0 301 Moved Permanently
Date: Sun, 25 Nov 2012 16:31:36 GMT
P3P: policyref="http://info.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV"
Cache-Control: max-age=3600, public
Location: http://news.yahoo.com/rss/oddlyenough
Vary: Accept-Encoding
Content-Type: text/html; charset=utf-8
Age: 1586
Content-Length: 81
Via: HTTP/1.1 rc4.ops.ch1.yahoo.com (YahooTrafficServer/1.20.10 [cHs f ])
Server: YTS/1.20.10
<!-- rc4.ops.ch1.yahoo.com uncompressed/chunked Sun Nov 25 16:31:36 UTC 2012 -->
它告诉您使用新地址 http://news.yahoo.com/rss/oddlyenough 。
实际上,如果您直接使用新地址,原始代码可以(直到他们再次更改地址,那就是......)并且速度更快,只需要一个请求而不是2。