我有一个XML文件如下所示缩短了当然有重复的标签foe和:
<file version=3.6 xmlns:xsi="http://ww.w3.org/2009/XMLSchemainstance">
<Date>2014-05-12</Date>
<creationTime>2014-05-12 :56:54</creationTime>
<location>http://www.w.org/2009/XMLSchemainstance/output/official/.20140512.PNL.xml.gz</location>
<contentType>nnn</contentType>
<signOffBy>gft_test_fo</signOffBy>
<signOffGroup>BRFPOOLNEW_SO</signOffGroup>
<book>
<riskBook>BRFPOOL</riskBook>
<trade>
<tradeId>00000000000009752</tradeId>
<subTrade>
<riskTrade>00000000000009752</riskTrade>
<riskProductType>BOND_NF</riskProductType>
<reportCollection>
<report>
<valuationSource>RISK_ENGINE</valuationSource>
<reportName>BRZ_HGS_PPTCC</reportName>
<riskPoint>
<value>0.00</value>
<valueCcy>BRL</valueCcy>
</riskPoint>
</report>
<report>
<valuationSource>RISK_ENGINE</valuationSource>
<reportName>BRZ_HGS_PPTCC</reportName>
<riskPoint>
<value>0.00</value>
<valueCcy>BRL</valueCcy>
</riskPoint>
</report>
</reportCollection>
</subTrade>
</trade>
</book>
</file>
我希望输出为csv如下:
Date,creationTime,location,contentType,signOffBy,signOffGroup,riskBook,tradeId,riskTrade,riskProductType,reportName,valuationSource,reportName,value,valueCcy
2014-05-12,2014-05-12 :56:54,http://ww.w3.org/2009/XMLSchemainstance/output/official/GLOBAL/GLOBAL_EM/BRFPOOL.20140512.PNL.xml.gz,nnn,gft_test_fo,BRFPOOLNEW_SO,BRFPOOL,00000000000009752,00000000000009752,BOND_NF,RISK_ENGINE,BRZ_HGS_PPTCC,0.00,BRL
2014-05-12,2014-05-12 :56:54,http://ww.w3.org/2009/XMLSchemainstance/output/official/GLOBAL/GLOBAL_EM/BRFPOOL.20140512.PNL.xml.gz,PNL,gft_test_fo,BRFPOOLNEW_SO,BRFPOOL,00000000000009752,00000000000009752,BOND_NF,RISK_ENGINE,BRZ_HGS_PPTCC,0.00,BRL
这是我到目前为止尝试的代码:
import xml.etree.ElementTree as etree
root=etree.parse('./emp.xml').getroot()
for b in zip(root.findall("book/trade/tradeId"),root.findall ("book/trade/subTrade/riskTrade"),root.findall("book/trade/subTrade/riskProductType"),root.findall("book/trade/subTrade/reportcollectin/report/valuationSource"),("book/trade/subTrade/reportcollectin/report/reportName"),("book/trade/subTrade/reportcollectin/report/refCurve"),("book/trade/subTrade/reportcollectin/report/riskPoint/value"),("book/trade/subTrade/reportcollectin/report/riskPoint/valueCcy")
print (",".join([x.text for x in b]))
我没有得到我预期的输出,请帮助我。
答案 0 :(得分:2)
除了XML中的错误(<creationTime>
和<file>
上没有结束标记)和python文件中(文件名中没有结束引号)并且某些路径路径拼写错误,如reportcollectin
)当涉及两个不同的大小列表时,您不能使用zip
函数,结果始终是较低的长度,并且在您要搜索的代码中{{1}这是一个空列表,最终结果也在空列表中结束。
最好的方法是首先获取主要变量(Date,creationTime,creationTime),然后使用循环遍历书籍和报告。