页面卷曲与俄语的

时间:2017-04-26 13:02:55

标签: php curl character-encoding

使用俄语https://web.archive.org/web/20060403041216/http://inostranets.ru:80/

这个页面的php进行卷曲时出现编码问题

下面是我使用的代码:

$url="https://web.archive.org/web/20060403041216/http://inostranets.ru:80/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);         
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'waybackmachinedownloader');
$html = curl_exec($ch);

结果我得到了与此相似的字符: “ÂÍÅÊÎÍÊÓÐÅÍÖÈÈ - ÑÊÀÇÎ×ÍÛÉÑÈÍÃÀÏÓÐ Òóðîïåðàòîð«ÄÅλïðèãëàøàåòÂàñïîîåòèòò“

请查看下面的图片

enter image description here

2 个答案:

答案 0 :(得分:2)

您尝试解析的页面是windows-1251编码的。 要告诉您输出windows-1251的浏览器,您可以使用:

header('Content-Type: text/html; charset=windows-1251');

即:

$url="https://web.archive.org/web/20060403041216/http://inostranets.ru:80/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'waybackmachinedownloader');
$html = curl_exec($ch);

header('Content-Type: text/html; charset=windows-1251');
print $html;

更新

要将$html保存到文件,请使用:

file_put_contents("curl_russian.html", $html);

注意:

当您打开html文件时,请务必在浏览器上选择Text EncodingCyrillic Windows

enter image description here

答案 1 :(得分:0)

我发现了问题。

我只需对输出进行编码,如下所示:

$html = mb_convert_encoding($html, "UTF-8", "Windows-1251"); 

而不是:

$html = mb_convert_encoding($html, "UTF-8", "Windows-1251 (CP1251)");