Question

我想获得一些UTF-8网站的标题。我得到了它们，但是当我想使用它们（在主要工作中作为ajax响应）时，例如，回显内容，浏览器设置为UTF-8但它没有正确显示，问题在哪里？

---------------更新----------------------
我测试以下所有解决方案但在下面的网站中无法正常工作它不会在下面的wesites中为我返回utf-8

http://roozannews.ir/detail/News/1553
http://asiaepress.com/detail/News/2226
http://www.hemayatonline.ir/detail/News/377
http://ecobourse.ir/detail/News/2308

--------------------更新结束------------------

     $url='http://farhangipress.ir/detail/News/6753';
    $html = file_get_contents_curl($url);

    if ($html) {
//parsing begins here:
        $doc = new DOMDocument();
        @$doc->loadHTML($html);
        $nodes = $doc->getElementsByTagName('title');

//get and display what you need:
        $title = $nodes->item(0)->nodeValue;
        $title=iconv(mb_detect_encoding($title, mb_detect_order(), true), "UTF-8", $title);
        echo $title;
        .....

我的file_get_contents_curl是：

function file_get_contents_curl($url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

    $data = curl_exec($ch);
    $info = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

    //checking mime types
    if (strstr($info, 'text/html')) {
        curl_close($ch);
        return $data;
    } else {
        return false;
    }
}

enter image description here

首先我尝试了：

 $title = $nodes->item(0)->nodeValue;
  echo $title;

但是工作不正常。

Answer 1

尝试在顶部添加标题，您就可以了。

<?php
header('Content-Type: text/html;charset=utf-8');
$url='http://farhangipress.ir/detail/News/6755';
$html = file_get_contents($url);
echo $html;

`cURL` 方式......

<?php
header('Content-Type: text/html;charset=utf-8');
function file_get_contents_curl($url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
   }

$url='http://farhangipress.ir/detail/News/6753';
echo $html = file_get_contents_curl($url);

<强> OUTPUT :

enter image description here

Answer 2

试试这个我在波斯语RSS阅读器中进行测试并给出反馈

 new DOMDocument('<meta http-equiv="content-type" content="text/html; charset=utf-8"');
 $doc->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));

Answer 3

你只是向DOMDocument提供一串HTML - 它不知道它是什么字符集。

网站可以通过多种方式告诉您内容所在的字符集：

HTTP Content-Type标头，例如Content-Type: text/html; charset=utf-8
HTML Content-Type meta-attr，例如<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

第一个（HTTP标头）在这个上下文中是无用的：file_get_contents_curl()从服务器读取数据并仅返回它提供给DOMDocument的内容 - DOMDocument不知道HTTP标头是什么。< / p>

第二个将正确告诉DOMDocument内容是什么字符集，但可能不会出现在很多网站上（因为HTTP标头已经告诉浏览器内容是什么字符集，通常没有必要还添加了HTML meta-attr）

因此，为了让DOMDocument知道HTML所在的charset，您可以：

确保在HTML
为您的内容添加定义字符集的XML序言：<?xml encoding="utf-8" ?>
使用mb_convert_encoding()

所以这应该有效：

$doc = new DOMDocument();
$html = '<?xml encoding="utf-8" ?>' . $html;
$doc->loadHTML($html);

或

$doc = new DOMDocument();
$html = mb_convert_encoding( $html, 'HTML-ENTITIES', 'UTF-8' );
$doc->loadHTML($html);

甚至这应该有效，但严格来说，元标记属于<head>（不在整个HTML之前）

$doc = new DOMDocument();
$html = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">' . $html;
$doc->loadHTML($html);

请注意，将编码作为第二个参数传递给DOMDocument的构造函数将不工作：

$doc = new DOMDocument('1.0', 'UTF-8'); // won't work!

Answer 4

试试以下

$dom = new DOMDocument('1.0', 'UTF-8');

Answer 5

最后我找到了解决方案

$html = file_get_contents_curl($url);
$html=mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');

为什么不在PHP中返回UTF-8？

5 个答案:

`cURL` 方式......

为什么不在PHP中返回UTF-8？

5 个答案:

cURL 方式......

`cURL` 方式......