使用curl </doctype>时获取完整的<doctype>

时间:2012-03-31 11:47:39

标签: php curl

我想要

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

但我正在

<!DOCTYPE html>
<html lang="en" id="facebook" class="no_js">

使用curl从php下面的代码获取HTTP响应主体

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,'http://www.facebook.com/');
$file=fopen("/var/www/myapp/welcome.txt","w+");
curl_setopt($ch, CURLOPT_FAILONERROR,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
curl_setopt($ch,CURLOPT_COOKIE,"PHPSESSID=5b1sXXXXo5niv5p0t24ntbh56X;fusion_user=13XXX.cXXX282138afbe9066b8be1cb426841d");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; it; rv:1.8.1.5) Gecko/20070713 Firefox/2.0.0.5");
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FILE, $file);
$retValue = curl_exec($ch); 
fclose($file); 
curl_close($ch);

2 个答案:

答案 0 :(得分:3)

Facebook使用HTML5 Doctype - 这只是<!DOCTYPE html> - 您将能够在facebook.com的源代码中看到它

答案 1 :(得分:0)

http://www.facebook.com/更改为包含所需文档类型的网页的网址。

Facebook使用您正在获取的Doctype。