在python中解析具有空属性的XML文件

时间:2018-07-28 16:22:52

标签: xml python-3.x xml-parsing

我在xml文件中有很多行,我正在尝试编写一个Python脚本,它将遍历这些行并将null属性更新为AWS格式。例如,我的树看起来像:

<TRANSFORMATION>
<ID_RSSD_PREDECESSOR>28</ID_RSSD_PREDECESSOR><ID_RSSD_SUCCESSOR>75026</ID_RSSD_SUCCESSOR>
<D_DT_TRANS/>
</TRANSFORMATION>

https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-xml

如何访问具有空值(<D_DT_TRANS/>)的属性并更新为:

<D_DT_TRANS></D_DT_TRANS>

1 个答案:

答案 0 :(得分:0)

您可以使用BeautifulSoup来解析/修改XML文档。此示例将使用零内容填充所有标签,并在其中插入空字符串-有效地将<tag/>扩展为<tag></tag>

data = """<TRANSFORMATION>
<ID_RSSD_PREDECESSOR>28</ID_RSSD_PREDECESSOR><ID_RSSD_SUCCESSOR>75026</ID_RSSD_SUCCESSOR>
<D_DT_TRANS/>
</TRANSFORMATION>"""

from bs4 import BeautifulSoup

xml_data = BeautifulSoup(data, 'xml')

for tag in xml_data.find_all(lambda t: len(t.contents) == 0):
    tag.string = ""

print(xml_data.prettify())

这将打印:

<?xml version="1.0" encoding="utf-8"?>
<TRANSFORMATION>
 <ID_RSSD_PREDECESSOR>
  28
 </ID_RSSD_PREDECESSOR>
 <ID_RSSD_SUCCESSOR>
  75026
 </ID_RSSD_SUCCESSOR>
 <D_DT_TRANS>
 </D_DT_TRANS>
</TRANSFORMATION>