Python html-sanitizer允许img标签

时间:2019-03-21 15:02:50

标签: python html html-sanitizing

大家好,我正在使用html-sanitizer python软件包,但由于默认情况下已禁用,因此我无法启用img标签

我尝试在站点程序包中编辑sanitizer.py(如下所示),但还是没有运气。

DEFAULT_SETTINGS = {
    "tags": {
        "a",
        "h1",
        "h2",
        "h3",
        "strong",
        "em",
        "p",
        "ul",
        "ol",
        "li",
        "br",
        "sub",
        "sup",
        "hr",
        "img"
    },
    "attributes": {"a": ("href", "name", "target", "title", "id", "rel"),"img": ("src")},
    "empty": {"hr", "a", "br"},
    "separate": {"a", "p", "li"},
    "whitespace": {"br"},
    "add_nofollow": False,
    "autolink": False,
    "sanitize_href": sanitize_href,
    "element_preprocessors": [
        # convert span elements into em/strong if a matching style rule
        # has been found. strong has precedence, strong & em at the same
        # time is not supported
        bold_span_to_strong,
        italic_span_to_em,
        tag_replacer("b", "strong"),
        tag_replacer("i", "em"),
        tag_replacer("form", "p"),
        target_blank_noopener,
    ],
    "element_postprocessors": [],
}

有人可以帮我吗。我想要仅具有 src属性

img标签

1 个答案:

答案 0 :(得分:0)

如果通过DEFAULT_SETTINGS提供了不同的设置,则

消毒剂将不会使用Sanitizer(settings={...})。这可能在这里发生,但我怀疑是empty属性是错误的。

sanitizer删除空标签,例如将<em></em>清除为''。很好,但是<img .../>也会导致一个空标签(也就是说,没有子标签),因此消毒剂会对其进行清理。

您需要将img与当前的settings['empty']一起添加到{"hr", "a", "br"}集中。

在进行此操作时,不要编辑DEFAULT,而要定义自己的(使用DEFAULT的副本)。例如:

# Make a copy
my_settings = dict(html_sanitizer.sanitizer.DEFAULT_SETTINGS)

# Add your changes
mysettings['tags'].add('img')
mysettings['empty'].add('img')
mysettings['attributes'].update({'img': ('src', )})

# Use it
s = html_sanitizer.Sanitizer(settings=mysettings)
s.sanitize('<em><img src="/index.html"/></em>')