data.xml
<?xml version="1.0" encoding="UTF-8"?>
<ArticleSet>
<Article>
<LastName>Bojarski</LastName>
<ForeName>-</ForeName>
<Affiliation>-</Affiliation>
</Article>
<Article>
<LastName>Genç</LastName>
<ForeName>Yasemin</ForeName>
<Affiliation>fgjfgnfgn</Affiliation>
</Article>
</ArticleSet>
示例代码
from lxml import etree
dom = etree.parse('data.xml')
root = dom.getroot()
for article in dom.xpath('Article[Affiliation="-"]'):
root.remove(article)
dom.write('output.xml')
此代码删除隶属关系等于-即其隶属标记看起来像<Affliation>-</Affliation>
的文章
当我将剩余的输出存储到output.xml中时,它将解析Unicode字符Genç
到Genç
,我想按原样存储它。
代码输出
<ArticleSet>
<Article>
<LastName>Genç</LastName>
<ForeName>Yasemin</ForeName>
<Affiliation>fgjfgnfgn</Affiliation>
</Article>
</ArticleSet>
必需的输出
<ArticleSet>
<Article>
<LastName>Genç</LastName>
<ForeName>Yasemin</ForeName>
<Affiliation>fgjfgnfgn</Affiliation>
</Article>
</ArticleSet>
答案 0 :(得分:0)
encoding
方法中有一个etree.write
参数。您也可以使用xml_declaration=True
声明输出文档的编码。
dom.write('output.xml', encoding='utf-8', xml_declaration=True)