curl php curl_exec返回编码结果

时间:2014-07-22 13:14:44

标签: php html curl web-scraping

我试图通过此代码获取网址:

<?
    $url='http://www.lacapital.com.ar/secciones/laciudad.html';
    $userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
    curl_setopt($ch, CURLOPT_URL,$url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    $result = curl_exec($ch);
    curl_close($ch);
?>

但$ result给了我这个:

  

>> * 7¥Ü†B'泰gM'²¹ÍþÊð7:!w ^ -o™©IHX $CÇrxPdÜ

     

I:!~üš...ÒæÌåÌñºÒpËy¹...bÛKg¿Æ$¾è%__ UTN <¯ “AU-‰Y” CR'×” ^(¼°电视= 9F©§(®ã

     

²¿Í¬&GT;÷½.mäXå6*‡ÜëE¼-| $,RãtJ6_t} OS©×ªÕTZ^ûN¯æât/,x¹ÍêÕZí; TI”,UEO(³3iË€¹ðŸ-| AOA”&安培;üKïˬ -OPO '(¯(T \çKF¤¡DZ'   ¹mA-“ĸ¾pT+ÂÉâJfAùž-'b7NEy¬:?¶,SS¢6%I P <çGKý$®KßR²†&LT;-X'ùšRÑ&安培;绫=ÈÛEî÷   μãñ“óbÇÙ)2ËåÓu,”n'ÛI3ë†{ñ“Ò²d9â²(/3'FkSÄúYd4@ 9°”§〜“CÇK§J×4   pÊérìŠ(Þr>sV.DÐV“jà ÎÂGÁ@‡‘ϱB‘•­°@g û ú(|V¾àêS„ʼnMè;¼p7²‹xRtm(¹a@Ô Û’À /…åx(»P8*íXý,Ð=õÅV(}$$<+N°U”„¸'Q±i;ðÙCYC{pE@¸ë Ë“®ì¥%â…¾Ýp¶J0¦ºoŽß¶öŽÏRZttÜ8m¾=ø¡UdGÇÇû'c™ÞµFs@2L­3²±,íÃóÓ“THûãþÁ)£"NöSySY‡¾„j‡×;9ÙÛ"—2¯_Áé¢ ²‰¼e]ígѾEä¦%|7•:ÕÕ¸\›ot¨ÌýÊ4a ù”lŠÎ!of2ç&guYo~[ñÁuš[H;¯àÛ°¸¼¬| $óEtVF¾ß‰tP” 3C¯7±ÝηŽ—ö_µ*ü©­®®®UkµÍ•b­¾²\[]ߨ­kÕµõµÍõêF±V[¯®UW ÕTÂÃá…c§¯ÔêµÕj}µ^[Y¯ÖkÓÄšqäêÉÙäâCì¾ÀO¥XçQ(§ñëJ‹»Ù®ùâ°q{ê-îB)GòµÚ”oñ•ëxŸ€aw'GíyøVŽõ¡cB]IxúÒAṉ°Q•.¿Äç2ü5šÚÞ¥PFV¿”É3úJ›-«¸´:¬T½n*g„àþX[¹Z[¹›j úMÎPT4p¾À›«ëW«ë÷,òÌYÞzýj½~Ïò(ÏœåÕj+Wðß=KÔ¹æ-sr¯Ü»L•K—¹øê»R‰5Ûm¶ÇÁJ¥¬r†× ë}!ÂÃ>@£l+§Fý3pêº1÷4 S „e7*ITtç½Óeð   ¼Uºèü”«·AU%÷™;¬ãŽ¸Ü˜ø/plø;²pY܃E›Cõ\ÁqÇ Š]§ù¢Ø•œEG;øOÏ—Ñ°8àþ§" ¼þ¯¿,Ê(FaüM*8²Ð‹§/¡<™[*C× ùb~i{\&+ ƒÎNu›9¯DÙ^/ìÃï—/—ØÏ‹¶´"„’eËà¬Z®À§‚xï|€ì__UT=Q&ï…g;ÝØ:ªÁþð'vâòkqÔb·Ê‰ƒK´£¶\_ÞX_[†^¢¶9ÚbÜ'oì„Öe§ôRmöþÿyHE—?¹Ý4—ÀI"]Öýkăv¯ü#¿*÷¤ì¹ô”   ¡w×é€F |îÊÎGo¥^®'OXÚ4EH〜P \'JÏs¼Þ8÷UR¨°jj÷áá   ÕGðÖ§QF|éså÷Úº¹ß8k¼g¿¯〜_ôøO   âôÏ_ÿÊÞØ^Äßåaôïó0ÎmX-Œ¼0_dùóFiy}¹¶¹±YªåQFÓîËw¼#@ V~ |¬ - £&LT; d“' - ?A:œ@wvéÏ*E¡ydTeä³ÇË):W5³€de¬=¤ÍÔOypíYð5ô#±½/   ÑÐÂó¨;ÁVží¤JÁ™(ƒY†Ò'。ûžé“•Jžm©ü½Ä^²&lt;Œ-ÂÜ+Û2êÀhÆYŸÊž+¶   >ß|jéZôD¨«I]ŸñŠ-©井植‡米”a¬   Ž¤-ÊöŒì°}QèÁ%@ K ^ ÀßÐ'&gt;ìb£Ž(Êbà_n±hÙRP[^ =!ýiKÁÏjÞ|ÖdPʶȮ@}‡nHO¹.2ö3(sD#,ÒJm3Ÿq43å€I“   EOY€¯ð¨4øgÄ)UE3-ƒOª/¯.oÎa|=〜d”Â,Ñ] \ÖFøʪÈÖâ«Z')\áBD±¶KæØÀ-ÿ$ä¥ð/ྯÀ¿Úͽæ–èHù‰µ÷ÿ¨šÈdêvJ>ô)X¦J®KNƵ€¼†ÃêÜGxî~$Í›¬r餆Ž±N³w}€OT×0℃;÷0安培;”μ@PDøPvløæØ™6 \“ëyèb»ºZdμ+нƒwÕíÜÚÀ%öÎèhéA|š,®   CÁxmHðË\úö§üÒÒö¨; =ûì“г?ÄÚ”6ÝïRâŠ'Ÿh'à庺åtßM+ PY   ìcҤϓ&amp;ýVºœF¨8&amp;»øìØ@ = @eÄÔwÈJ‰jákZZ9m2Åœ| [úÄÙ¨qÄ°zÁ¤   ƒ7'>p¾§œ©BÁÅ4äç¯Û<$ð1ê»à7ìÆ¿ýÁ|Js§KZ,cŸÏ-ŸËÝÎ)àhlÓ   ¤¾4°8€:ÁÇLÄàO¥BÕea_°×{ìm'Oúòz¯Œ_   *c¤Çê'¸Â}Jáì³®/ DY÷OT $÷mML§Ð@ CH 2 <“pà8ô|¯9μ½NQÆÐ'¢ÜToØkâÄ«'MÝ2FÖ'ç¨^ M-½VhVQDÃêëñ#乐©SEChc¬é÷:· VåÑÜà¿   €R $€A&GT; K-ŸÙ5øT†QGH |iƒ   ¨,Ç'¯ÀRÄ©†|g~½÷î...¼0)ì;|è •õ+ê·i²†m뉊QeµÀ:Xc ÆÆ!à¨Ö%˜;,ñ΀õ$°Þ¾X4 M)ÊAÔAÍêˆBŽØÝDʵÞÄ£>èXa\•2HIÆ•´ËÝ@ ]W*5øû¿ 9lx†,[ý,XZXXÀÿÐÈ2''À©üW...'¤Caœðþê@   à™EU™”yRâÌ7(ó%d ...O_6Öøfu|“þ±ïv~¹®®6}£“hÒÓb9ìiμo.PyIÂ'2] F | CWJ©¿ÆúocBÖCÉmÒ70ß|ÀTÏÈôû¾ôd¸×* ASS   %W6o íFJX hC‹0D@"ÑBO©³ÀtÝ“| †–‹,Ñc(¥[†@Ì.i‡duУÝNG@ªÈMÇ?k   A&GT;,¤àz   3AI,:×îÂÞÅy»Ba,-A5N÷S〜åÀqärÎsªs$间谍[±|'...ai8Qù= QU}EÙãÒÒÄÞÇÕÓ};9DmC§éŸSaèîK+54ØfúõáaâèU»†-ܽ²(ÅCGE〜[gjø= EW™SDZ«V〜OS ‡Æv0 '©' + ^ A÷•\18ØʉÎúFƒ[959 +Šƒë½~ºØ\­^­V%èçÚFõj­zñ–âÉëåjõª¾ZmŒ½Ù{ÓLÞ¬Á›Zµšû€–jTëÓD[i‡Ü•‡#õ3ƒ(ø®Üž‰Ö¤¿ /ÀÔNËëÂ’v’JQµª¡Ç dõ‰R#(ha’OQ„Ô‹²¡hRq0&€.†MN³æ«=d“>7ìv_~.à´U&Ûûàƒ*pšùK³œ®Ì6y{øšâ_óyxh {Òƒà§ïbFp2'Fé]»®〜ØVfÐM¹úb9øÅÛ±AOQ·k¶[§?'NwrK| $ H ^¾TÓ” €C-;¬VÕD-B〜;欧‰¨1Êm'ê5Q»®Þa&GT;ÓëAi_(÷oÔáàõ-Qƒƒ×§w&LT; A€”ÚÏäyGFáVläS〜I(•'®8]Cê5Î〜Ф‡} IAC _ O Z(æ©ä«Ã«|úμNªÞ¶è<井植{0O†&LT;™楼Þ'Ðì;®]€vÕZ½” #QF ......Ãx8ÿ2o_Ãsò<òç5QË•PY·‰”¨ ÷ÙVk7ONΰó0= C ...Áã{øo%¸ý| ... B‰c +ÔTFWhLpA¬¸ß'ÄʼʓŸÎVô¬ÊJe}ƒoæ•$μØtÆM4ŒáGdÖu¶EØ°ßNìѯ¼ŽýçH®¥Øs~dìV ...G¼ú€¸oqÔÕAú6Ž1   

1 个答案:

答案 0 :(得分:1)

我试过你的代码,是的,它是乱码。需要指出的是,您还可以在CURLOPT_ENCODING中添加选项curl。例如:

$url = 'http://www.lacapital.com.ar/secciones/laciudad.html';
$userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_ENCODING, ''); // add this one

$result = curl_exec($ch);
curl_close($ch);

echo $result;

注意:正如@Akshay建议的那样,这也很有效。

curl_setopt($ch, CURLOPT_ENCODING, 'gzip'); // add this one