我查看了一些支持页面,示例和文档但是我仍然难以理解如何在使用python之后实现我的目标。
我需要处理/解析xml提要,只需从XML文档中获取非常具体的值。这就是我被困的地方。
xml如下所示:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed>
<title type="text">DailyTreasuryYieldCurveRateData</title>
<id></id>
<updated>2014-12-03T07:44:30Z</updated>
<link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData" />
<entry>
<id></id>
<title type="text"></title>
<updated>2014-12-03T07:44:30Z</updated>
<author>
<name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6235)" />
<category />
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">6235</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2014-12-01T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">0.01</d:BC_1MONTH>
<d:BC_3MONTH m:type="Edm.Double">0.03</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">0.08</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">0.13</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">0.49</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">0.9</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">1.52</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">1.93</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.22</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.66</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.95</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.95</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
<entry>
<id></id>
<title type="text"></title>
<updated>2014-12-03T07:44:30Z</updated>
<author>
<name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6236)" />
<category />
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">6236</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2014-12-02T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">0.04</d:BC_1MONTH>
<d:BC_3MONTH m:type="Edm.Double">0.03</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">0.08</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">0.14</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">0.55</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">0.96</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">1.59</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">2</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.28</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.72</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">3</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">3</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
</feed>
此XML文档会在每个月的某个时间内附加一个新条目,它会在下个月1日重置并重新启动。
我需要从d:NEW_DATE中提取日期,并从d:BC_10YEAR中提取值,现在当只有一个条目时这没有问题,但我正在努力弄清楚如何让它通过文件和从每个ENTRY块中提取相关的日期和值。
非常感谢任何帮助。
答案 0 :(得分:0)
BeautifulSoup可能是您寻找所需内容的最简单方法:
from BeautifulSoup import BeautifulSoup
xmldoc = open('datafile.xml', 'r').read()
bs = BeautifulSoup(xmldoc)
entryList = bs.findAll('entry')
for entry in entryList:
print entry.content.find('m:properties').find('d:new_date').contents[0]
print entry.content.find('m:properties').find('d:bc_10year').contents[0]
然后,您可以将print
替换为您想要处理的数据(添加到列表等)。