PHP DOMDocument saveHTML没有正确编码西里尔文

时间:2017-11-20 17:17:01

标签: php utf-8 character-encoding domdocument

我使用DOMDocument来操作html和php 7.问题是文本在页面上显示良好(西里尔文),但是当我转到“查看HTML页面源”时,它不是很好< / strong>即可。它显示如下: &#1047;&#1076;&#1077;&#1089;&#1100; &#1086;&#1089;&#1085;

可能有什么问题? <meta> charset是utf-8。我的代码:

$dom = new DOMDocument();
if (@$dom->loadHTML(mb_convert_encoding("<div>$body</div>", 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD)) {

    // https://stackoverflow.com/questions/29493678/loadhtml-libxml-html-noimplied-on-an-html-fragment-generates-incorrect-tags

    $container = $dom->getElementsByTagName('div')->item(0);
    $container = $container->parentNode->removeChild($container);

    while ($dom->firstChild)
        $dom->removeChild($doc->firstChild);

    while ($container->firstChild )
        $dom->appendChild($container->firstChild);

    $xpath = new DOMXPath($dom); 
    $headlines = $xpath->query("//h2");
    // some code..

    return $dom->saveHTML();
}

1 个答案:

答案 0 :(得分:1)

问题在于List<Model> listModel; for (Model model : listModel) { try { new UpDateData().bankData(model.getCust_id(), model.getBank_id(), model.getDate()); } catch (Exception e) { // TODO: handle exception e.printStackTrace(); } } ,您需要将根节点添加为参数,如下所示:

public class Model 
{
    private  int cust_id;
    private  int bank_id;
    private  String date;
    //setter and getter
}

突然它以不同的方式呈现页面,替换。如果没有,请仔细检查$dom->saveHTML();return $dom->saveHTML((new \DOMXPath($dom))->query('/')->item(0)); 的值,它们应为$dom->encoding$dom->substituteEntities