HTMLPurifier在图像周围添加<p>

时间:2019-07-12 18:36:08

标签: htmlpurifier

您好,由于某些原因,HTMLPurifier(v4.10)在img标签周围添加了一个段落标签,并删除了它们的style属性。

这是我的配置。

    $config->set("AutoFormat.AutoParagraph", false);
    $config->set("AutoFormat.RemoveEmpty.Predicate", ['colgroup' => [], 'th' => [], 'td' => [], 'iframe' => ['src'], 'div' => ['class']]);
    $config->set("AutoFormat.RemoveEmpty", true);
    $config->set("Core.NormalizeNewlines", true);
    $config->set('HTML.Allowed', "*[style|class],h1, h2, h3, h4, h5, h6, hr, pre, code, blockquote, table, tr, td, th, thead, tbody, strong, em, b, i, u, s, span, a, p, br, ul, ol, li, img[src], div");

这是我要净化的一个例子。

<div class="some_class">
<img src='http://example.com/test.jpg' style='position:absolute;top:0px;left:0px;width:100%'>
</div>

最终吐出来的是:

<div class="some_class">
<p><img src='http://example.com/test.jpg' alt='test' /></p>
</div>

还将Core.RemoveInvalidImg设置为true / false不会执行任何操作。同样的事情发生。

编辑:

Here is full code sample:
<?php
require 'vendor/autoload.php';
$my_text = file_get_contents('test3.md');
preg_match_all('/[<]style[^>]*[>]([^<]+)[<]\/style[>]/', $my_text, $matches, PREG_OFFSET_CAPTURE);

echo '<!DOCTYPE html><html><head><style type="text/css">';

for($i = 0; $i < count($matches[1]); $i++) {
    echo $matches[1][$i][0]."\n\n";
}

echo '</style></head><body>';


$config = HTMLPurifier_Config::createDefault();
$config->set("AutoFormat.AutoParagraph", false);
$config->set("AutoFormat.RemoveEmpty.Predicate", ['colgroup' => [], 'th' => [], 'td' => [], 'iframe' => ['src'], 'div' => ['class']]);
$config->set("AutoFormat.RemoveEmpty", true);
$config->set("Core.NormalizeNewlines", true);
$config->set('HTML.Allowed', "*[style|class],h1, h2, h3, h4, h5, h6, hr, pre, code, blockquote, table, tr, td, th, thead, tbody, strong, em, b, i, u, s, span, a, p, br, ul, ol, li, img[src], div");

$config->set('HTML.Allowed', "*[style|class],h1, h2, h3, h4, h5, h6, hr, pre, code, blockquote, table, tr, td, th, thead, tbody, strong, em, b, i, u, s, span, a, p, br, ul, ol, li, img[src], div");
$Purifier = new HTMLPurifier($config);

echo $Purifier->purify($page);

?>
</body></html>

1 个答案:

答案 0 :(得分:0)

此文件:

<?php

require_once(__DIR__ . '/library/HTMLPurifier.auto.php');

$config = HTMLPurifier_Config::createDefault();

$config->set("AutoFormat.AutoParagraph", false);
$config->set("AutoFormat.RemoveEmpty.Predicate", ['colgroup' => [], 'th' => [], 'td' => [], 'iframe' => ['src'], 'div' => ['class']]);
$config->set("AutoFormat.RemoveEmpty", true);
$config->set("Core.NormalizeNewlines", true);
$config->set('HTML.Allowed', "*[style|class],h1, h2, h3, h4, h5, h6, hr, pre, code, blockquote, table, tr, td, th, thead, tbody, strong, em, b, i, u, s, span, a, p, br, ul, ol, li, img[src], div");

$dirty_html = "<div class=\"some_class\">
<img src='http://example.com/test.jpg' style='position:absolute;top:0px;left:0px;width:100%'>
</div>";

$purifier = new HTMLPurifier($config);
echo $purifier->purify($dirty_html);

...给我以下输出:

<div class="some_class">
<img src="http://example.com/test.jpg" alt="test.jpg" /></div>

您能告诉我们更多有关您的设置的信息吗?在我看来,这似乎正在发生其他事情,例如拦截您的净化调用并进行完全不同的处理(例如,应用自动段落逻辑)-请注意例如人们可能期望HTML净化器如何将您的单引号引起来属性值更改为双引号的属性值,根据您的问题,这是没有做到的。

看看您是否可以排除任何第三方参与?