我试图用html实体从下面的字符串中创建一个div元素。由于我的字符串包含html实体,因此html实体中的&
保留字符在输出中被转义为&
。因此,html实体显示为纯文本。我怎样才能避免这种情况,以便正确呈现html实体?
s = 'Actress Adamari López And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts™ Website And Resources'
div = etree.Element("div")
div.text = s
lxml.html.tostring(div)
output:
<div>Actress Adamari L&#243;pez And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts&#8482; Website And Resources</div>
答案 0 :(得分:3)
您可以在调用encoding
时指定tostring()
:
>>> from lxml.html import fromstring, tostring
>>> s = 'Actress Adamari López And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts™ Website And Resources'
>>> div = fromstring(s)
>>> print tostring(div, encoding='unicode')
<p>Actress Adamari López And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts™ Website And Resources</p>
作为附注,您在处理HTML
数据时should definitely use lxml.html.tostring()
:
请注意,您应该使用
lxml.html.tostring
而不是lxml.tostring
。lxml.tostring(doc)
将返回文档的XML表示形式, 这是无效的HTML。特别是,<script src="..."></script>
之类的内容会被序列化为<script src="..." />
,这会让浏览器感到困惑。
另见: