您好,由于某些原因,HTMLPurifier(v4.10)在img标签周围添加了一个段落标签,并删除了它们的style属性。
这是我的配置。
$config->set("AutoFormat.AutoParagraph", false);
$config->set("AutoFormat.RemoveEmpty.Predicate", ['colgroup' => [], 'th' => [], 'td' => [], 'iframe' => ['src'], 'div' => ['class']]);
$config->set("AutoFormat.RemoveEmpty", true);
$config->set("Core.NormalizeNewlines", true);
$config->set('HTML.Allowed', "*[style|class],h1, h2, h3, h4, h5, h6, hr, pre, code, blockquote, table, tr, td, th, thead, tbody, strong, em, b, i, u, s, span, a, p, br, ul, ol, li, img[src], div");
这是我要净化的一个例子。
<div class="some_class">
<img src='http://example.com/test.jpg' style='position:absolute;top:0px;left:0px;width:100%'>
</div>
最终吐出来的是:
<div class="some_class">
<p><img src='http://example.com/test.jpg' alt='test' /></p>
</div>
还将Core.RemoveInvalidImg设置为true / false不会执行任何操作。同样的事情发生。
编辑:
Here is full code sample:
<?php
require 'vendor/autoload.php';
$my_text = file_get_contents('test3.md');
preg_match_all('/[<]style[^>]*[>]([^<]+)[<]\/style[>]/', $my_text, $matches, PREG_OFFSET_CAPTURE);
echo '<!DOCTYPE html><html><head><style type="text/css">';
for($i = 0; $i < count($matches[1]); $i++) {
echo $matches[1][$i][0]."\n\n";
}
echo '</style></head><body>';
$config = HTMLPurifier_Config::createDefault();
$config->set("AutoFormat.AutoParagraph", false);
$config->set("AutoFormat.RemoveEmpty.Predicate", ['colgroup' => [], 'th' => [], 'td' => [], 'iframe' => ['src'], 'div' => ['class']]);
$config->set("AutoFormat.RemoveEmpty", true);
$config->set("Core.NormalizeNewlines", true);
$config->set('HTML.Allowed', "*[style|class],h1, h2, h3, h4, h5, h6, hr, pre, code, blockquote, table, tr, td, th, thead, tbody, strong, em, b, i, u, s, span, a, p, br, ul, ol, li, img[src], div");
$config->set('HTML.Allowed', "*[style|class],h1, h2, h3, h4, h5, h6, hr, pre, code, blockquote, table, tr, td, th, thead, tbody, strong, em, b, i, u, s, span, a, p, br, ul, ol, li, img[src], div");
$Purifier = new HTMLPurifier($config);
echo $Purifier->purify($page);
?>
</body></html>
答案 0 :(得分:0)
此文件:
<?php
require_once(__DIR__ . '/library/HTMLPurifier.auto.php');
$config = HTMLPurifier_Config::createDefault();
$config->set("AutoFormat.AutoParagraph", false);
$config->set("AutoFormat.RemoveEmpty.Predicate", ['colgroup' => [], 'th' => [], 'td' => [], 'iframe' => ['src'], 'div' => ['class']]);
$config->set("AutoFormat.RemoveEmpty", true);
$config->set("Core.NormalizeNewlines", true);
$config->set('HTML.Allowed', "*[style|class],h1, h2, h3, h4, h5, h6, hr, pre, code, blockquote, table, tr, td, th, thead, tbody, strong, em, b, i, u, s, span, a, p, br, ul, ol, li, img[src], div");
$dirty_html = "<div class=\"some_class\">
<img src='http://example.com/test.jpg' style='position:absolute;top:0px;left:0px;width:100%'>
</div>";
$purifier = new HTMLPurifier($config);
echo $purifier->purify($dirty_html);
...给我以下输出:
<div class="some_class">
<img src="http://example.com/test.jpg" alt="test.jpg" /></div>
您能告诉我们更多有关您的设置的信息吗?在我看来,这似乎正在发生其他事情,例如拦截您的净化调用并进行完全不同的处理(例如,应用自动段落逻辑)-请注意例如人们可能期望HTML净化器如何将您的单引号引起来属性值更改为双引号的属性值,根据您的问题,这是没有做到的。
看看您是否可以排除任何第三方参与?