使用俄语https://web.archive.org/web/20060403041216/http://inostranets.ru:80/
这个页面的php进行卷曲时出现编码问题下面是我使用的代码:
$url="https://web.archive.org/web/20060403041216/http://inostranets.ru:80/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'waybackmachinedownloader');
$html = curl_exec($ch);
结果我得到了与此相似的字符: “ÂÍÅÊÎÍÊÓÐÅÍÖÈÈ - ÑÊÀÇÎ×ÍÛÉÑÈÍÃÀÏÓÐ Òóðîïåðàòîð«ÄÅλïðèãëàøàåòÂàñïîîåòèòò“
请查看下面的图片
答案 0 :(得分:2)
您尝试解析的页面是windows-1251
编码的。
要告诉您输出windows-1251
的浏览器,您可以使用:
header('Content-Type: text/html; charset=windows-1251');
,
即:
$url="https://web.archive.org/web/20060403041216/http://inostranets.ru:80/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'waybackmachinedownloader');
$html = curl_exec($ch);
header('Content-Type: text/html; charset=windows-1251');
print $html;
更新
要将$html
保存到文件,请使用:
file_put_contents("curl_russian.html", $html);
注意:
当您打开html
文件时,请务必在浏览器上选择Text Encoding
至Cyrillic Windows
。
答案 1 :(得分:0)
我发现了问题。
我只需对输出进行编码,如下所示:
$html = mb_convert_encoding($html, "UTF-8", "Windows-1251");
而不是:
$html = mb_convert_encoding($html, "UTF-8", "Windows-1251 (CP1251)");