Question

我正在使用file_get_contents来检索和保存某些网站的html。我不知道每个网站可能有什么编码。

我使用以下适用于大多数网站的内容：

$html = file_get_contents($url);
$encoding = mb_detect_encoding($html);

if($encoding != 'UTF-8') {
   $html = mb_convert_encoding($html, "UTF-8", $encoding); 
}

这通常有效，但有几个网站会返回这样的内容：

1f8b 0800 0000 0000 0003 ed7d eb72 db46
b6ee efb8 6ade a1c3 a98a a43d 0489 fb45
b6e4 7194 4ce2 d976 e21d 799c 3367 9272

这个垃圾大约1000行。它是什么以及如何修复它以便它返回页面的HTML？

由于

Answer 1

您可以尝试使用DOMDocument来满足您的要求。例如：

$html = new DOMDocument();
$html->loadHTMLFile($url);
$html->saveHTMLFile('your_file_name');

file_get_contents奇怪的编码

1 个答案: