我只是想抓取雅虎网页。 www.yahoo.com
如果我从托管网站运行我的简单脚本,它就可以了。
如果我从我的localhost尝试它。我得到的是一个标题响应: “w32.fp.re1.yahoo.com uncompressed / chunked Wed Apr 27 15:13:48 PDT 2011”
这是我的代码:
<?php
function curl_download($Url){
// is cURL installed yet?
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
// OK cool - then let's create a new cURL resource handle
$ch = curl_init();
// Now set some options (most are optional)
// Set URL to download
curl_setopt($ch, CURLOPT_URL, $Url);
// Set a referer
curl_setopt($ch, CURLOPT_REFERER, "http://www.example.org/yay.htm");
// User agent
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
// Include header in result? (0 = yes, 1 = no)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Should cURL return or print out the data? (true = return, false = print)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Timeout in seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
// Download the given URL, and return output
$output = curl_exec($ch);
// Close the cURL resource, and free system resources
curl_close($ch);
return $output;
}
print curl_download('http://www.yahoo.com/');
?>
答案 0 :(得分:1)
实际上,结果以
开头HTTP/1.1 302 Found
这意味着那里有一个Location
标题。还有:
Location: http://nl.yahoo.com/?p=us
这只是回复正文:
<!-- w20.fp.ird.yahoo.com uncompressed/chunked Wed Apr 27 17:08:09 PDT 2011 -->
您需要告诉cURL关注位置标头。就是这样。
该选项的名称为CURLOPT_FOLLOWLOCATION
。将其设置为true:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
<强> PS 强>
在关注1
位置标题后,当我运行您的代码+ FOLLOWLOCATION时,这是响应正文的开头:
<!DOCTYPE html>
<html lang="nl-NL" class="y-fp-bg y-fp-pg-grad bkt732">