我花了几个小时寻找这个问题的解决方案,但空手而归。我试图在Python中解析一个xml文档,以逗号分隔列表的形式返回元素。
以下是xml文档的示例:
<?xml version="1.0" encoding="utf-8"?>
<Report xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://adcenter.microsoft.com/advertiser/reporting/v5/XMLSchema" ReportName="My DestinationUrl Performance Report" ReportTime="4/7/2014" TimeZone="Various" ReportAggregation="Daily" LastCompletedAvailableDay="4/8/2014 5:00:00 PM (GMT)" LastCompletedAvailableHour="4/8/2014 5:00:00 PM (GMT)" PotentialIncompleteData="false">
<DestinationUrlPerformanceReportColumns>
<Column name="GregorianDate" />
<Column name="AccountName" />
<Column name="CampaignName" />
<Column name="CampaignId" />
<Column name="AdGroupName" />
<Column name="AdGroupId" />
<Column name="DestinationUrl" />
<Column name="Impressions" />
<Column name="Clicks" />
<Column name="Spend" />
<Column name="Conversions" />
</DestinationUrlPerformanceReportColumns>
<Table>
<Row>
<GregorianDate value="4/7/2014" />
<AccountName value="BingAccount" />
<CampaignName value="Campaign#1" />
<CampaignId value="12345678" />
<AdGroupName value="Adgroup1" />
<AdGroupId value="901234567" />
<DestinationUrl value="www.example.com" />
<Impressions value="8" />
<Clicks value="0" />
<Spend value="0.00" />
<Conversions value="0" />
</Row>
<Row>
<GregorianDate value="4/7/2014" />
<AccountName value="BingAccount" />
<CampaignName value="Campaign#2" />
<CampaignId value="83984398493" />
<AdGroupName value="Adgroup#2" />
<AdGroupId value="3439843983" />
<DestinationUrl value="www.example.co.uk" />
<Impressions value="20" />
<Clicks value="2" />
<Spend value="0.10" />
<Conversions value="0" />
</Row>
</Table>
<Copyright>©2014 Microsoft Corporation. All rights reserved. </Copyright>
</Report>
我想在逗号分隔列表中返回每个行值,因此所需的结果将是: ( '2014年4月7日', 'BingAccount', '广告活动#1', '12345678', 'Adgroup1', '901234567', 'www.example.com', '8', '0', '0.00' , '0') ( '2014年4月7日', 'BingAccount', '广告活动#2', '83984398493', 'Adgroup2', '3439843983', 'www.example.co.uk', '20', '2',” 0.10' , '0')
这是我迄今为止所做的,但未能取得进一步进展:
from xml.dom import minidom
xmldoc = minidom.parse('file.xml')
rows = xmldoc.firstChild.childNodes[3].childNodes
for i in rows:
print tuple(i.childNodes)
答案 0 :(得分:0)
尝试xml.etree
。
In [4]: print a
<?xml version="1.0" encoding="utf-8"?>
<Report xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://adcenter.microsoft.com/advertiser/reporting/v5/XMLSchema" ReportName="My DestinationUrl Performance Report" ReportTime="4/7/2014" TimeZone="Various" ReportAggregation="Daily" LastCompletedAvailableDay="4/8/2014 5:00:00 PM (GMT)" LastCompletedAvailableHour="4/8/2014 5:00:00 PM (GMT)" PotentialIncompleteData="false">
<DestinationUrlPerformanceReportColumns>
<Column name="GregorianDate" />
<Column name="AccountName" />
<Column name="CampaignName" />
<Column name="CampaignId" />
<Column name="AdGroupName" />
<Column name="AdGroupId" />
<Column name="DestinationUrl" />
<Column name="Impressions" />
<Column name="Clicks" />
<Column name="Spend" />
<Column name="Conversions" />
</DestinationUrlPerformanceReportColumns>
<Table>
<Row>
<GregorianDate value="4/7/2014" />
<AccountName value="BingAccount" />
<CampaignName value="Campaign#1" />
<CampaignId value="12345678" />
<AdGroupName value="Adgroup1" />
<AdGroupId value="901234567" />
<DestinationUrl value="www.example.com" />
<Impressions value="8" />
<Clicks value="0" />
<Spend value="0.00" />
<Conversions value="0" />
</Row>
<Row>
<GregorianDate value="4/7/2014" />
<AccountName value="BingAccount" />
<CampaignName value="Campaign#2" />
<CampaignId value="83984398493" />
<AdGroupName value="Adgroup#2" />
<AdGroupId value="3439843983" />
<DestinationUrl value="www.example.co.uk" />
<Impressions value="20" />
<Clicks value="2" />
<Spend value="0.10" />
<Conversions value="0" />
</Row>
</Table>
<Copyright>�.©2014 Microsoft Corporation. All rights reserved. </Copyright>
</Report>
In [5]: import xml.etree.ElementTree as ET
In [6]: root = ET.fromstring(a)
In [7]: [tuple([y.attrib['value'] for y in x]) for x in root[1]]
Out[7]:
[('4/7/2014',
'BingAccount',
'Campaign#1',
'12345678',
'Adgroup1',
'901234567',
'www.example.com',
'8',
'0',
'0.00',
'0'),
('4/7/2014',
'BingAccount',
'Campaign#2',
'83984398493',
'Adgroup#2',
'3439843983',
'www.example.co.uk',
'20',
'2',
'0.10',
'0')]