lxml修改标签防止

时间:2014-11-19 15:43:30

标签: python html lxml

如何防止lxml修改标签

from lxml import etree
from lxml.html.soupparser import fromstring

html = '<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>'
root = fromstring(html)
print etree.tostring(root,encoding='utf-8')

打印标签的短版本

'<iframe width="560" height="315" src="" frameborder="0" allowfullscreen/>'

如何防止这种情况?需要输出

'<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>'

1 个答案:

答案 0 :(得分:2)

tostring()method="html"

一起使用
print etree.tostring(root.find('iframe'), encoding='utf-8', method="html")

演示:

>>> from lxml import etree
>>> from lxml.html.soupparser import fromstring
>>>
>>> html = '<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>'
>>> root = fromstring(html)
>>> print etree.tostring(root.find('iframe'), encoding='utf-8', method="html")
<iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="" width="560"></iframe>