Question

我想获取网站的一些内容，所以我在php中使用了file_get_contents或curl函数。但问题是这些功能并不适用于每个站点，例如：他们正在为google.com工作，但不适用于iteye.com。我的代码如下：

$baseurl = 'http://www.iteye.com/';  
$contents = file_get_contents($baseurl);

//OR
$ch = curl_init();
$timeout = 10;
curl_setopt ($ch, CURLOPT_URL, $baseurl);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$list = curl_exec($ch);

我猜这个网站阻止了这些功能（file_get_contents或curl），那么如何继续从这些网站获取内容，如iteye.com？

Answer 1

如果您想获取任何网站，我建议您使用CURL

你必须注意：

http重定向，例如301,302
用户代理
HTTPS
有时推荐人也可以发行

你必须像人一样尽可能多地行事。

因此，您的代码中也可能不会遗漏这些指令：

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com'); 
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

Answer 2

您可能需要指示curl遵循重定向，并且还需要更改用户代理：

$ch = curl_init();
$timeout = 10;
curl_setopt ($ch, CURLOPT_URL, $baseurl);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);

curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt ($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

$list = curl_exec($ch);

PHP：file_get_contents和curl对某些网站不起作用

2 个答案: