我收到一个包含许多子元素的XML文档,我需要提取信息,然后导出到CSV或文本文档,以便我可以导入到Quickbooks。 XML树如下所示:
<MODocuments>
<MODocument>
<Document>TX1126348</Document>
<DocStatus>P</DocStatus>
<DateIssued>20180510</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>15500</TotalPounds>
</MOLot>
<MOLot>
<LotID>B</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>175</TotalPounds>
</MOLot>
<MOLot>
<LotID>C</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>7500</TotalPounds>
</MOLot>
<MOLot>
<LotID>D</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>300</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
<MODocument>
<Document>TX1126349</Document>
<DocStatus>P</DocStatus>
<DateIssued>20180511</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>25200</TotalPounds>
</MOLot>
<MOLot>
<LotID>B</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>16800</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
<MODocument>
<Document>TX1126350</Document>
<DateIssued>20180511</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>14100</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
</MODocuments>
我需要从每个MODocument父项中提取TotalPounds,因此输出如下所示: 文档编号,申请人姓名和总计为这一个文档中的所有MOL添加了总计。
TX1126348 COMPANY FRUIT & VEGETABLE 23475
TX1126349 COMPANY FRUIT & VEGETABLE 42000
TX1126350 COMPANY FRUIT & VEGETABLE 14100
以下是与我合作的代码:
import xml.etree.ElementTree as ET
tree = ET.parse('TX_959_20180514131311.xml')
root = tree.getroot()
docCert = []
docComp = []
totalPounds=[]
for MODocuments in root:
for MODocument in MODocuments:
docCert.append(MODocument.find('Document').text)
docComp.append(MODocument.find('ApplicantName').text)
for MOLots in MODocument:
for MOLot in MOLots:
totalPounds.append(int(MOLot.find('TotalPounds').text))
for i in range(len(docCert)):
print(i, docCert[i],' ', docComp[i], totalPounds[i])
这是我的输出,我不知道如何为每个文档添加总计..请帮忙。
0 TX1126348 COMPANY FRUIT & VEGETABLE 15500
1 TX1126349 COMPANY FRUIT & VEGETABLE 175
2 TX1126350 COMPANY FRUIT & VEGETABLE 7500
答案 0 :(得分:1)
totalPounds
中的项目似乎比docCert
或docComp
中的项目多。我想你需要做这样的事情:
for MODocuments in root:
for MODocument in MODocuments:
docCert.append(MODocument.find('Document').text)
docComp.append(MODocument.find('ApplicantName').text)
sub_total = 0
for MOLots in MODocument:
for MOLot in MOLots:
sub_total += int(MOLot.find('TotalPounds').text)
totalPounds.append(sub_total)
答案 1 :(得分:1)
如果你可以使用lxml,你可以让XPath sum()
函数为你提供所有TotalPounds。
示例...
from lxml import etree
import csv
tree = etree.parse("TX_959_20180514131311.xml")
with open("output.csv", "w", newline="") as csvfile:
csvwriter = csv.writer(csvfile, delimiter=",", quoting=csv.QUOTE_MINIMAL)
for mo_doc in tree.xpath("/MODocuments/MODocument"):
csvwriter.writerow([mo_doc.xpath("Document")[0].text,
mo_doc.xpath("ApplicantName")[0].text,
int(mo_doc.xpath("sum(MOLots/MOLot/TotalPounds)"))])
&#34; output.csv&#34; ...
的内容TX1126348,COMPANY FRUIT & VEGETABLE,23475
TX1126349,COMPANY FRUIT & VEGETABLE,42000
TX1126350,COMPANY FRUIT & VEGETABLE,14100
此外,通过使用csv
编写输出,您可以控制引号,分隔符等。