Question

前提

我想使用HTML Purifier将<body>代码转换为<div>代码，以保留<body>元素的内联样式，例如<body style="background:color#000000;">Hi there.</body>会转向<div style="background:color#000000;">Hi there.</div>。我正在查看custom tag和TagTransform类的组合。

当前设置

在我的配置部分，我目前正在这样做：

$htmlDef  = $this->configuration->getHTMLDefinition(true);
// defining the element to avoid triggering 'Element 'body' is not supported'
$bodyElem = $htmlDef->addElement('body', 'Block', 'Flow', 'Core');
$bodyElem->excludes = array('body' => true);
// add the transformation rule
$htmlDef->info_tag_transform['body'] = new HTMLPurifier_TagTransform_Simple('div');

...以及通过配置指令允许<body>及其style（以及class和id）属性（它们是工作的一部分），大型列表，已解析为HTML.AllowedElements和HTML.AllowedAttributes）。

我已经关闭了定义缓存。

$config->set('Cache.DefinitionImpl', null);

不幸的是，在此设置中，似乎HTMLPurifier_TagTransform_Simple从未调用过transform()方法。

HTML.Parent？

我认为罪魁祸首是我的HTML.Parent，设置为'div'，因为很自然地，<div>不允许使用子<body>元素。但是，将HTML.Parent设置为'html'会阻止我：

ErrorException：无法将无法识别的元素用作父

...添加

$htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core');
$htmlElem->excludes = array('html' => true);

...摆脱了该错误消息，但仍然没有转换标签 - 它被删除了。

...添加

$htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core');
$htmlElem->excludes = array('html' => true);

...也什么也没做，因为它给我一个错误信息：

ErrorException: Trying to get property of non-object       

[...]/library/HTMLPurifier/Strategy/FixNesting.php:237
[...]/library/HTMLPurifier/Strategy/Composite.php:18
[...]/library/HTMLPurifier.php:181
[...]

我现在仍在调整最后一个选项，试图找出我需要提供的确切语法，但如果有人知道如何根据他们自己过去的经验帮助我，我会感激任何指针正确的方向。

HTML.TidyLevel？

作为我能想象到的唯一罪魁祸首，我的HTML.TidyLevel设置为'heavy'。我还没有尝试所有可能的星座，但到目前为止，这没有任何区别。

（因为我只是在接触它，我很难回忆起我已经尝过哪些星座，以免我在这里列出它们，但是因为我缺乏自信，所以我不会错过我做过的事情或者误报了一些东西。但是，当我做完一些专门的测试时，我可能会稍后编辑这一部分！）

完整配置

我的配置数据存储在JSON中，然后解析为HTML Purifier。这是文件：

{
    "CSS" : {
        "MaxImgLength" : "800px"
    },
    "Core" : {
        "CollectErrors" : true,
        "HiddenElements" : {
            "script"   : true,
            "style"    : true,
            "iframe"   : true,
            "noframes" : true
        },
        "RemoveInvalidImg" : false
    },
    "Filter" : {
        "ExtractStyleBlocks" : true
    },
    "HTML" : {
        "MaxImgLength" : 800,
        "TidyLevel"    : "heavy",
        "Doctype"      : "XHTML 1.0 Transitional",
        "Parent"       : "html"
    },
    "Output" : {
        "TidyFormat"   : true
    },
    "Test" : {
        "ForceNoIconv" : true
    },
    "URI" : {
        "AllowedSchemes" : {
            "http"     : true,
            "https"    : true,
            "mailto"   : true,
            "ftp"      : true
        },
        "DisableExternalResources" : true
    }
}

（URI.Base，URI.Munge和Cache.SerializerPath也已设置，但我已在此粘贴中将其删除。此外，HTML.Parent警告：如上所述，通常，这设置为'div'。）

Answer 1

此代码是您正在执行的操作不起作用的原因：

/**
 * Takes a string of HTML (fragment or document) and returns the content
 * @todo Consider making protected
 */
public function extractBody($html) {
    $matches = array();
    $result = preg_match('!<body[^>]*>(.*)</body>!is', $html, $matches);
    if ($result) {
        return $matches[1];
    } else {
        return $html;
    }
}

您可以使用％Core.ConvertDocumentToFragment将其关闭为false;如果你的其余代码是无bug的，它应该直接从那里开始工作。我不相信你的bodyElem定义是必要的.j

Answer 2

这样做不会容易得多：

$search = array('<body', 'body>');
$replace = array('<div', 'div>');

$html = '<body style="background:color#000000;">Hi there.</body>';

echo str_replace($search, $replace, $html);

>> '<div style="background:color#000000;">Hi there.</div>';

HTML Purifier：转换为<div> </div>

前提

当前设置

HTML.Parent？

HTML.TidyLevel？

完整配置

2 个答案: