在python中解析XML - 难以理解如何做到这一点

时间:2014-12-03 12:16:29

标签: python xml parsing

我查看了一些支持页面,示例和文档但是我仍然难以理解如何在使用python之后实现我的目标。

我需要处理/解析xml提要,只需从XML文档中获取非常具体的值。这就是我被困的地方。

xml如下所示:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed>
 <title type="text">DailyTreasuryYieldCurveRateData</title>
 <id></id>
 <updated>2014-12-03T07:44:30Z</updated>
 <link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData" />
 <entry>
 <id></id>
<title type="text"></title>
<updated>2014-12-03T07:44:30Z</updated>
<author>
  <name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6235)" />
<category />
<content type="application/xml">
  <m:properties>
    <d:Id m:type="Edm.Int32">6235</d:Id>
    <d:NEW_DATE m:type="Edm.DateTime">2014-12-01T00:00:00</d:NEW_DATE>
    <d:BC_1MONTH m:type="Edm.Double">0.01</d:BC_1MONTH>
    <d:BC_3MONTH m:type="Edm.Double">0.03</d:BC_3MONTH>
    <d:BC_6MONTH m:type="Edm.Double">0.08</d:BC_6MONTH>
    <d:BC_1YEAR m:type="Edm.Double">0.13</d:BC_1YEAR>
    <d:BC_2YEAR m:type="Edm.Double">0.49</d:BC_2YEAR>
    <d:BC_3YEAR m:type="Edm.Double">0.9</d:BC_3YEAR>
    <d:BC_5YEAR m:type="Edm.Double">1.52</d:BC_5YEAR>
    <d:BC_7YEAR m:type="Edm.Double">1.93</d:BC_7YEAR>
    <d:BC_10YEAR m:type="Edm.Double">2.22</d:BC_10YEAR>
    <d:BC_20YEAR m:type="Edm.Double">2.66</d:BC_20YEAR>
    <d:BC_30YEAR m:type="Edm.Double">2.95</d:BC_30YEAR>
    <d:BC_30YEARDISPLAY m:type="Edm.Double">2.95</d:BC_30YEARDISPLAY>
  </m:properties>
 </content>
</entry>
<entry>
<id></id>
<title type="text"></title>
<updated>2014-12-03T07:44:30Z</updated>
<author>
  <name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6236)" />
<category />
<content type="application/xml">
  <m:properties>
    <d:Id m:type="Edm.Int32">6236</d:Id>
    <d:NEW_DATE m:type="Edm.DateTime">2014-12-02T00:00:00</d:NEW_DATE>
    <d:BC_1MONTH m:type="Edm.Double">0.04</d:BC_1MONTH>
    <d:BC_3MONTH m:type="Edm.Double">0.03</d:BC_3MONTH>
    <d:BC_6MONTH m:type="Edm.Double">0.08</d:BC_6MONTH>
    <d:BC_1YEAR m:type="Edm.Double">0.14</d:BC_1YEAR>
    <d:BC_2YEAR m:type="Edm.Double">0.55</d:BC_2YEAR>
    <d:BC_3YEAR m:type="Edm.Double">0.96</d:BC_3YEAR>
    <d:BC_5YEAR m:type="Edm.Double">1.59</d:BC_5YEAR>
    <d:BC_7YEAR m:type="Edm.Double">2</d:BC_7YEAR>
    <d:BC_10YEAR m:type="Edm.Double">2.28</d:BC_10YEAR>
    <d:BC_20YEAR m:type="Edm.Double">2.72</d:BC_20YEAR>
    <d:BC_30YEAR m:type="Edm.Double">3</d:BC_30YEAR>
    <d:BC_30YEARDISPLAY m:type="Edm.Double">3</d:BC_30YEARDISPLAY>
  </m:properties>
</content>
</entry>
</feed>

此XML文档会在每个月的某个时间内附加一个新条目,它会在下个月1日重置并重新启动。

我需要从d:NEW_DATE中提取日期,并从d:BC_10YEAR中提取值,现在当只有一个条目时这没有问题,但我正在努力弄清楚如何让它通过文件和从每个ENTRY块中提取相关的日期和值。

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

BeautifulSoup可能是您寻找所需内容的最简单方法:

from BeautifulSoup import BeautifulSoup

xmldoc = open('datafile.xml', 'r').read()
bs = BeautifulSoup(xmldoc)

entryList = bs.findAll('entry')

for entry in entryList:
    print entry.content.find('m:properties').find('d:new_date').contents[0]
    print entry.content.find('m:properties').find('d:bc_10year').contents[0]

然后,您可以将print替换为您想要处理的数据(添加到列表等)。