大家好,我正在使用html-sanitizer python软件包,但由于默认情况下已禁用,因此我无法启用img标签
我尝试在站点程序包中编辑sanitizer.py(如下所示),但还是没有运气。
DEFAULT_SETTINGS = {
"tags": {
"a",
"h1",
"h2",
"h3",
"strong",
"em",
"p",
"ul",
"ol",
"li",
"br",
"sub",
"sup",
"hr",
"img"
},
"attributes": {"a": ("href", "name", "target", "title", "id", "rel"),"img": ("src")},
"empty": {"hr", "a", "br"},
"separate": {"a", "p", "li"},
"whitespace": {"br"},
"add_nofollow": False,
"autolink": False,
"sanitize_href": sanitize_href,
"element_preprocessors": [
# convert span elements into em/strong if a matching style rule
# has been found. strong has precedence, strong & em at the same
# time is not supported
bold_span_to_strong,
italic_span_to_em,
tag_replacer("b", "strong"),
tag_replacer("i", "em"),
tag_replacer("form", "p"),
target_blank_noopener,
],
"element_postprocessors": [],
}
有人可以帮我吗。我想要仅具有 src属性
的 img标签答案 0 :(得分:0)
DEFAULT_SETTINGS
提供了不同的设置,则消毒剂将不会使用Sanitizer(settings={...})
。这可能在这里发生,但我怀疑是empty
属性是错误的。
sanitizer
将也删除空标签,例如将<em></em>
清除为''
。很好,但是<img .../>
也会导致一个空标签(也就是说,没有子标签),因此消毒剂会对其进行清理。
您需要将img
与当前的settings['empty']
一起添加到{"hr", "a", "br"}
集中。
在进行此操作时,不要编辑DEFAULT,而要定义自己的(使用DEFAULT的副本)。例如:
# Make a copy
my_settings = dict(html_sanitizer.sanitizer.DEFAULT_SETTINGS)
# Add your changes
mysettings['tags'].add('img')
mysettings['empty'].add('img')
mysettings['attributes'].update({'img': ('src', )})
# Use it
s = html_sanitizer.Sanitizer(settings=mysettings)
s.sanitize('<em><img src="/index.html"/></em>')