Question

我收到一个包含许多子元素的XML文档，我需要提取信息，然后导出到CSV或文本文档，以便我可以导入到Quickbooks。 XML树如下所示：

<MODocuments>
  <MODocument>
    <Document>TX1126348</Document>
    <DocStatus>P</DocStatus>
    <DateIssued>20180510</DateIssued>
    <ApplicantName>COMPANY FRUIT &amp; VEGETABLE</ApplicantName>
    <MOLots>
      <MOLot>
        <LotID>A</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>15500</TotalPounds>
      </MOLot>
      <MOLot>
        <LotID>B</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>175</TotalPounds>
      </MOLot>
      <MOLot>
        <LotID>C</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>7500</TotalPounds>
      </MOLot>
      <MOLot>
        <LotID>D</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>300</TotalPounds>
      </MOLot>
    </MOLots>
  </MODocument>
  <MODocument>
    <Document>TX1126349</Document>
    <DocStatus>P</DocStatus>
    <DateIssued>20180511</DateIssued>
    <ApplicantName>COMPANY FRUIT &amp; VEGETABLE</ApplicantName>
    <MOLots>
      <MOLot>
        <LotID>A</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>25200</TotalPounds>
      </MOLot>
      <MOLot>
        <LotID>B</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>16800</TotalPounds>
      </MOLot>
    </MOLots>
  </MODocument>
  <MODocument>
    <Document>TX1126350</Document>
    <DateIssued>20180511</DateIssued>
    <ApplicantName>COMPANY FRUIT &amp; VEGETABLE</ApplicantName>
    <MOLots>
      <MOLot>
        <LotID>A</LotID>
        <ProductVariety>Yellow</ProductVariety>
        <TotalPounds>14100</TotalPounds>
      </MOLot>
    </MOLots>
  </MODocument>
</MODocuments>

我需要从每个MODocument父项中提取TotalPounds，因此输出如下所示：文档编号，申请人姓名和总计为这一个文档中的所有MOL添加了总计。

TX1126348   COMPANY FRUIT & VEGETABLE 23475
TX1126349   COMPANY FRUIT & VEGETABLE 42000
TX1126350   COMPANY FRUIT & VEGETABLE 14100

以下是与我合作的代码：

import xml.etree.ElementTree as ET
tree = ET.parse('TX_959_20180514131311.xml')
root = tree.getroot()

docCert = []
docComp = []
totalPounds=[]

for MODocuments in root:
    for MODocument in MODocuments:
        docCert.append(MODocument.find('Document').text)
        docComp.append(MODocument.find('ApplicantName').text)
        for MOLots in MODocument:
            for MOLot in MOLots:
                totalPounds.append(int(MOLot.find('TotalPounds').text))

for i in range(len(docCert)):
    print(i, docCert[i],' ', docComp[i], totalPounds[i])

这是我的输出，我不知道如何为每个文档添加总计..请帮忙。

0 TX1126348   COMPANY FRUIT & VEGETABLE 15500
1 TX1126349   COMPANY FRUIT & VEGETABLE 175
2 TX1126350   COMPANY FRUIT & VEGETABLE 7500

Answer 1

totalPounds中的项目似乎比docCert或docComp中的项目多。我想你需要做这样的事情：

for MODocuments in root:
    for MODocument in MODocuments:
        docCert.append(MODocument.find('Document').text)
        docComp.append(MODocument.find('ApplicantName').text)
        sub_total = 0
        for MOLots in MODocument:
            for MOLot in MOLots:
                sub_total += int(MOLot.find('TotalPounds').text)
        totalPounds.append(sub_total)

Answer 2

如果你可以使用lxml，你可以让XPath sum()函数为你提供所有TotalPounds。

示例...

from lxml import etree
import csv

tree = etree.parse("TX_959_20180514131311.xml")

with open("output.csv", "w", newline="") as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=",", quoting=csv.QUOTE_MINIMAL)
    for mo_doc in tree.xpath("/MODocuments/MODocument"):
        csvwriter.writerow([mo_doc.xpath("Document")[0].text,
                            mo_doc.xpath("ApplicantName")[0].text,
                            int(mo_doc.xpath("sum(MOLots/MOLot/TotalPounds)"))])

＆＃34; output.csv＆＃34; ...

的内容

TX1126348,COMPANY FRUIT & VEGETABLE,23475
TX1126349,COMPANY FRUIT & VEGETABLE,42000
TX1126350,COMPANY FRUIT & VEGETABLE,14100

此外，通过使用csv编写输出，您可以控制引号，分隔符等。

使用python在嵌套的XML子元素中添加整数

2 个答案: