我有一个xml文件,我需要替换类属性的值,这取决于每个p元素的dfn文本。所以,我有一个这样的html文件:
<html>
<head></head>
<body>
<p class ='person'><dfn>New-York</dfn>
<p class = 'place'><dfn>John Doe</dfn>
</body>
</html>
我想解析这个文档,并用正确的属性替换所有类属性的值。为了定义dfn-text是一个地方还是人,我的脚本中已经有一组条件。所以,我希望获得与输出相同的html文件,但使用正确的类:
<html>
<head></head>
<body>
<p class ='**place**'><dfn>New-York</dfn>
<p class = '**person**'><dfn>John Doe</dfn>
</body>
</html>
到目前为止,我试图实现它寻找dfn的祖先p及其属性&#39; class&#39;,然后尝试用replace()函数替换它,但它没有&#39;真的有用:
filename = open('file.html', 'r+')
tree = etree.parse(filename)
def f1():
for dfn in tree.getiterator('dfn'):
def_text = dfn.text
if def_text == 'New York' #a list of conditions in my real script, New York is an example only):
class1 = ''.join(dfn.xpath('ancestor::p//@class')
filename.write(class1.replace('person', 'place'))
我得到的只是同一个文件,但有一行&#39; place&#39;附上一个结尾...
答案 0 :(得分:0)
使用lxml with xslt转换您的html,例如:
from lxml import etree
h = '''<html>
<head></head>
<body>
<p class ='person'><dfn>New-York</dfn></p>
<p class = 'place'><dfn>John Doe</dfn></p>
</body>
</html>'''
doc = etree.fromstring(h, etree.HTMLParser())
xsl = '''<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p">
<xsl:variable name="original-class" select="string(@class)" />
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:if test="dfn[text()='New-York']">
<xsl:attribute name="class">
<xsl:value-of select="concat('**', $original-class, '**')"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>'''
xslt_root = etree.XML(xsl)
transform = etree.XSLT(xslt_root)
result_tree = transform(doc)
print result_treeoutput:
输出:
$ python x.py
<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head>
<body>
<p class="**person**"><dfn>New-York</dfn></p>
<p class="place"><dfn>John Doe</dfn></p>
</body>
</html>