我有一个带有一些错误名称空间的输入XML。我试图用ElementTree修复它们,但没有成功
示例输入:(此处ns0:可以是ns:,p:,n:等...)
<ns0:Invoice xmlns:ns0="http://invoices.com/docs/xsd/invoices/v1.2" version="FPR12">
<InvoiceHeader>
<DataH>data header</DataH>
</InvoiceHeader>
<InvoiceBody>
<DataB>data body</DataB>
</InvoiceBody>
</ns0:Invoice>
需要的输出文件:(根目录中的命名空间必须没有前缀,并且某些内部标记声明为xmlns =“”)
<Invoice xmlns:"http://invoices.com/docs/xsd/invoices/v1.2" version="FPR12">
<InvoiceHeader xmlns="">
<DataH>data header</DataH>
</InvoiceHeader>
<InvoiceBody xmlns="">
<DataB>data body</DataB>
</InvoiceBody>
</Invoice>
我尝试如下更改根名称空间,但生成的文件未更改
import xml.etree.ElementTree as ET
tree = ET.parse('./cache/test.xml')
root = tree.getroot()
root.tag = '{http://invoices.com/docs/xsd/invoices/v1.2}Invoice'
xml = ET.tostring(root, encoding="unicode")
with open('./cache/output.xml', 'wt') as f:
f.write(xml)
代替尝试
changing root.tag = 'Invoice'
它产生一个根本没有名称空间的标签
请让我知道我是否在犯任何错误,还是应该切换到另一个库,或者尝试使用正则表达式替换字符串
预先感谢
答案 0 :(得分:0)
现在是否对任何人都有用,但我设法使用lxml和以下代码修复了名称空间。
from lxml import etree
from copy import deepcopy
tree = etree.parse('./cache/test.xml')
# create a new root without prefix in the namespace
NSMAP = {None : "http://invoices.com/docs/xsd/invoices/v1.2"}
root = etree.Element("{http://invoices.com/docs/xsd/invoices/v1.2}Invoice", nsmap = NSMAP)
# copy attributes from original root
for attr, value in tree.getroot().items():
root.set(attr,value)
# deep copy of children (adding empty namespace in some tags)
for child in tree.getroot().getchildren():
if child.tag in( 'InvoiceHeader', 'InvoiceBody'):
child.set("xmlns","")
root.append( deepcopy(child) )
xml = etree.tostring(root, pretty_print=True)
with open('./cache/output.xml', 'wb') as f:
f.write(xml)