如何将多个xml标签中的数据写入csv中的多个列?

时间:2017-09-12 19:18:45

标签: python-2.7 xml-parsing elementtree

我正在尝试从API调用中获取数据,该API调用返回XML对象并将少量数据点解析为csv文件,每个对象都在其自己的列中。

XML看起来像这样:

<?xml version="1.0" encoding="utf-8" ?>

<YourMembership_Response>
<Items>
<Item>
<ItemID></ItemID>
<ID>92304823A-2932</ID>
<WebsiteID>0987</WebsiteID>
<NamePrefix></NamePrefix>
<FirstName>John</FirstName>
<MiddleName></MiddleName>
<LastName>Smith</LastName>
<Suffix></Suffix>
<Nickname></Nickname>
<EmployerName>abc company</EmployerName>
<WorkTitle>manager</WorkTitle>
<Date>3/14/2013 2:12:39 PM</Date>
<Description>Removed from group by Administration.</Description>
</Item>
<Item>
<ItemID></ItemID>
<ID>92304823A-2932</ID>
<WebsiteID>0987</WebsiteID>
<NamePrefix></NamePrefix>
<FirstName>John</FirstName>
<MiddleName></MiddleName>
<LastName>Smith</LastName>
<Suffix></Suffix>
<Nickname></Nickname>
<EmployerName>abc company</EmployerName>
<WorkTitle>manager</WorkTitle>
<Date>3/14/2013 2:12:39 PM</Date>
<Description>Removed from group by Administration.</Description>
</Item>

我已经编写了这段代码,只将ID写入CSV,工作正常。

with open("output1.csv", "wb") as f:
    writer = csv.writer(f)
    for node in tree.findall('.//ID'):
        writer.writerow([node.text])

现在,当我尝试将多个数据点写入csv时,机器只是将数据点附加到一列中。这是我一直尝试的代码:

with open("test1.csv", "wb") as f:
    writer = csv.writer(f)
    for node in tree.findall('.//ID'):
        writer.writerow([node.text])
    for node in tree.findall('.//FirstName'):
        writer.writerow([node.text])
    for node in tree.findall('.//LastName'):
        writer.writerow([node.text]) 

我需要在csv中看到这样的数据,以及稍后选择的其他数据点,我做错了什么?:

ID                    FirstName     LastName
92304823A-2932         John           Smith

提前谢谢。

1 个答案:

答案 0 :(得分:1)

这实质上就是如何收集数据。

>>> from xml.etree import ElementTree
>>> tree = ElementTree.parse('api.xml')
>>> tree.findall('.//Item')
[<Element 'Item' at 0x0000000006679EA8>, <Element 'Item' at 0x0000000006681318>]
>>> for item in tree.findall('.//Item'):
...     item.find('ID').text, item.find('FirstName').text, item.find('LastName').text
... 
('92304823A-2932', 'John', 'Smith')
('92304823A-2932', 'John', 'Smith')

相比之下,当您使用像tree.findall('.//ID')这样的构造时,您要求xpath引擎以tree开头(那是&#39;。&#39;部分)向下看看所有出现的ID&#39; ID&#39; 马上。这意味着,在您的xml代码示例中,您将获得两个ID的 set ,这些ID甚至不一定是原始顺序。您需要做的是,首先找到所有Item条目,然后找到感兴趣的三个相应数据片段Item

附录:

>>> import csv
>>> with open('api.csv', 'w', newline='') as csvfile:
...     fieldnames = ['ID', 'FirstName', 'LastName']
...     writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
...     writer.writeheader()
...     for item in tree.findall('.//Item'):
...         writer.writerow({
...             'ID': item.find('ID').text,
...             'FirstName': item.find('FirstName').text,
...             'LastName': item.find('LastName').text})

产生的输出文件:

ID,FirstName,LastName
92304823A-2932,John,Smith
92304823A-2932,John,Smith