防止loadHTML修改标记

时间:2018-06-04 15:20:37

标签: php dom xml-parsing

我使用DOMDocument::loadHTML设置了过滤器。它读取一些给定的标记,并将类_light添加到各种HTML元素。

除了将<p><noscript> ... </noscript><img ... /></p>周围的标记更改为<p></p><noscript> ... </noscript><img ... />之外,其效果很好。我想它是这样做的,因为<noscript>被认为是块级的,因此不应该嵌套在<p>标签中。我想要禁用此自动格式化,并按原样解析代码。

如何禁用DOMDocument :: loadHTML的自动重新格式化?

输入标记:

<p class="user-content_text text ">
    <noscript>
        <img class="size-full wp-image-99 -round aligncenter" src="http://www.example.com/wp-content/uploads/2018/06/example-photo.jpg" alt="" width="256" height="257" srcset="http://www.example.com/wp-content/uploads/2018/06/example-photo.jpg 256w, http://www.example.com/wp-content/uploads/2018/06/example-photo-150x150.jpg 150w" sizes="(max-width: 256px) 100vw, 256px">
    </noscript>
    <img class="_js size-full wp-image-99 -round aligncenter" alt="" sizes="(max-width: 256px) 100vw, 256px" srcset="http://www.example.com/wp-content/uploads/2018/06/example-photo.jpg 256w, http://www.example.com/wp-content/uploads/2018/06/example-photo-150x150.jpg 150w" width="256" height="257">
</p>

输出加价:

<p class="user-content_text text _light"></p>
<noscript>
    <img class="size-full wp-image-99 -round aligncenter" src="http://www.example.com/wp-content/uploads/2018/06/example-photo.jpg" alt="" width="256" height="257" srcset="http://www.example.com/wp-content/uploads/2018/06/example-photo.jpg 256w, http://www.example.com/wp-content/uploads/2018/06/example-photo-150x150.jpg 150w" sizes="(max-width: 256px) 100vw, 256px">
</noscript>
<img class="_js size-full wp-image-99 -round aligncenter" alt="" sizes="(max-width: 256px) 100vw, 256px" srcset="http://www.example.com/wp-content/uploads/2018/06/example-photo.jpg 256w, http://www.example.com/wp-content/uploads/2018/06/example-photo-150x150.jpg 150w" width="256" height="257">

所需的输出标记:

<p class="user-content_text text _light">
    <noscript>
        <img class="size-full wp-image-99 -round aligncenter" src="http://www.example.com/wp-content/uploads/2018/06/example-photo.jpg" alt="" width="256" height="257" srcset="http://www.example.com/wp-content/uploads/2018/06/example-photo.jpg 256w, http://www.example.com/wp-content/uploads/2018/06/example-photo-150x150.jpg 150w" sizes="(max-width: 256px) 100vw, 256px">
    </noscript>
    <img class="_js size-full wp-image-99 -round aligncenter" alt="" sizes="(max-width: 256px) 100vw, 256px" srcset="http://www.example.com/wp-content/uploads/2018/06/example-photo.jpg 256w, http://www.example.com/wp-content/uploads/2018/06/example-photo-150x150.jpg 150w" width="256" height="257">
</p>

过滤

function light_text($content) {
    global $post;

    if ($content) {
        $DOM = new DOMDocument();
        $DOM->loadHTML(mb_convert_encoding("<html>{$content}</html>", "HTML-ENTITIES", "UTF-8"), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

        $elements = array(
            "anchor"         => $DOM->getElementsByTagName("a"),
            "heading_1"      => $DOM->getElementsByTagName("h1"),
            "heading_2"      => $DOM->getElementsByTagName("h2"),
            "heading_3"      => $DOM->getElementsByTagName("h3"),
            "heading_4"      => $DOM->getElementsByTagName("h4"),
            "heading_5"      => $DOM->getElementsByTagName("h5"),
            "heading_6"      => $DOM->getElementsByTagName("h6"),
            "paragraph"      => $DOM->getElementsByTagName("p"),
            "ordered_list"   => $DOM->getElementsByTagName("ol"),
            "unordered_list" => $DOM->getElementsByTagName("ul"),
            "table"          => $DOM->getElementsByTagName("table"),
            "figcaption"     => $DOM->getElementsByTagName("figcaption"),
        );

        foreach ($elements as $type => $list) {
            foreach ($list as $element) {
                $existing_classes = $element->getAttribute("class") ? $element->getAttribute("class") : "";

                $element->setAttribute("class", "_light {$existing_classes}");
            }
        }

        // remove unneeded HTML tag
        $DOM = remove_root_tag($DOM);

        $content = $DOM->saveHTML();
    }

    return $content;
}
add_filter("the_content", "light_text");

0 个答案:

没有答案