Python库lxml
似乎提供了几个用于生成HTML文档的构建器。这些之间有什么区别?
但是这些生成纯HTML,而不是XHTML。虽然我可以手动添加xmlns声明,但这是不优雅的。那么用lxml生成 X HTML文档的推荐方法是什么?
lxml.builder.E
来自http://lxml.de/tutorial.html#the-e-factory的示例:
>>> from lxml.builder import E
>>> def CLASS(*args): # class is a reserved word in Python
... return {"class":' '.join(args)}
>>> html = page = (
... E.html( # create an Element called "html"
... E.head(
... E.title("This is a sample document")
... ),
... E.body(
... E.h1("Hello!", CLASS("title")),
... E.p("This is a paragraph with ", E.b("bold"), " text in it!"),
... E.p("This is another paragraph, with a", "\n ",
... E.a("link", href="http://www.python.org"), "."),
... E.p("Here are some reserved characters: <spam&egg>."),
... etree.XML("<p>And finally an embedded XHTML fragment.</p>"),
... )
... )
... )
lxml.html.builder
来自http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory的示例:
>>> from lxml.html import builder as E
>>> from lxml.html import usedoctest
>>> html = E.HTML(
... E.HEAD(
... E.LINK(rel="stylesheet", href="great.css", type="text/css"),
... E.TITLE("Best Page Ever")
... ),
... E.BODY(
... E.H1(E.CLASS("heading"), "Top News"),
... E.P("World News only on this page", style="font-size: 200%"),
... "Ah, and here's some more text, by the way.",
... lxml.html.fromstring("<p>... and this is a parsed fragment ...</p>")
... )
... )
答案 0 :(得分:0)
Python库lxml似乎提供了几个用于生成HTML文档的构建器。它们之间的区别是什么?
lxml.builder.E正在使用工厂模式
from lxml.html import builder as E from lxml.html import usedoctest html = E.HTML( E.HEAD( E.LINK(rel="stylesheet", href="great.css", type="text/css"), E.TITLE("Best Page Ever") ), E.BODY( E.H1(E.CLASS("heading"), "Top News"), E.P("World News only on this page", style="font-size: 200%"), "Ah, and here's some more text, by the way.", lxml.html.fromstring("... and this is a parsed fragment ...
") )
lxml.builder正在使用原型模式:
from lxml.builder import E def CLASS(*args): # class is a reserved word in Python return {"class":' '.join(args)} html = page = ( E.html( # create an Element called "html" E.head( E.title("This is a sample document") ), E.body( E.h1("Hello!", CLASS("title")), E.p("This is a paragraph with ", E.b("bold"), " text in it!"), E.p("This is another paragraph, with a", "\n ", E.a("link", href="http://www.python.org"), "."), E.p("Here are some reserved characters: ."), etree.XML("And finally an embedded XHTML fragment.
"), ) ) )
虽然我可以手动添加xmlns声明,但这不太优雅。
XSLT将是另一种选择。
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml">
<xsl:output method="xml" encoding="utf-8" version="" indent="yes" standalone="no" media-type="text/html" omit-xml-declaration="no" doctype-system="about:legacy-compat" />
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml">
<xsl:copy-of select="."/>
</html>
</xsl:template>
</xsl:stylesheet>
<强>参考强>
答案 1 :(得分:0)
从lxml.builder中混合ElementMaker和E对我来说很有用:
from lxml import etree
from lxml.builder import ElementMaker,E
M=ElementMaker(namespace=None,
nsmap={None: "http://www.w3.org/1999/xhtml"})
html = M.html(E.head(E.title("Test page")),
E.body(E.p("Hello world")))
result = etree.tostring(html,
xml_declaration=True,
doctype='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">',
encoding='utf-8',
standalone=False,
with_tail=False,
method='xml',
pretty_print=True)
print result
结果是
<?xml version='1.0' encoding='utf-8' standalone='no'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Test page</title>
</head>
<body>
<p>Hello world</p>
</body>
</html>
答案 2 :(得分:0)
Python库lxml似乎为 生成HTML文档。这些有什么区别?
lxml.builder.E提供了一个带有类<class 'lxml.etree._Element'>
的XML文档,这意味着该对象不了解(而且可能不是)蜂拥html文档。
lxml.html.builder提供一个<class 'lxml.html.HtmlElement'>
,这意味着您有一个知道它是html的对象,并提供了{e.body
或e.make_links_absolute()
这样的HTML专用方法和属性。 3}}