我在xml文件中有很多行,我正在尝试编写一个Python脚本,它将遍历这些行并将null属性更新为AWS格式。例如,我的树看起来像:
<TRANSFORMATION>
<ID_RSSD_PREDECESSOR>28</ID_RSSD_PREDECESSOR><ID_RSSD_SUCCESSOR>75026</ID_RSSD_SUCCESSOR>
<D_DT_TRANS/>
</TRANSFORMATION>
https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-xml
如何访问具有空值(<D_DT_TRANS/>
)的属性并更新为:
<D_DT_TRANS></D_DT_TRANS>
答案 0 :(得分:0)
您可以使用BeautifulSoup来解析/修改XML文档。此示例将使用零内容填充所有标签,并在其中插入空字符串-有效地将<tag/>
扩展为<tag></tag>
:
data = """<TRANSFORMATION>
<ID_RSSD_PREDECESSOR>28</ID_RSSD_PREDECESSOR><ID_RSSD_SUCCESSOR>75026</ID_RSSD_SUCCESSOR>
<D_DT_TRANS/>
</TRANSFORMATION>"""
from bs4 import BeautifulSoup
xml_data = BeautifulSoup(data, 'xml')
for tag in xml_data.find_all(lambda t: len(t.contents) == 0):
tag.string = ""
print(xml_data.prettify())
这将打印:
<?xml version="1.0" encoding="utf-8"?>
<TRANSFORMATION>
<ID_RSSD_PREDECESSOR>
28
</ID_RSSD_PREDECESSOR>
<ID_RSSD_SUCCESSOR>
75026
</ID_RSSD_SUCCESSOR>
<D_DT_TRANS>
</D_DT_TRANS>
</TRANSFORMATION>