从long xml document我试图获得一些属性。具体来说,我希望获得阶段cfs和ft级别,这段代码可靠地完成。困难在于我似乎无法弄清楚如何从标签中提取时间戳作为日期时间值,类似于:
<ns1:value qualifiers="P" dateTime="2012-11-01T18:45:00.000-05:00">54800</ns1:value>
非常感谢任何有关改进的帮助和建议。
def getLevels(gaugeId):
# create url string 00060=cfs and 00065=ft
urlRoot = "http://waterservices.usgs.gov/nwis/iv/?format=waterml,1.1&sites="
urlTail = "¶meterCd=00060,00065"
url = urlRoot + str(gaugeId) + urlTail
del urlRoot, urlTail
# open connection to url
urlFile = urllib2.urlopen(url)
# convert urlFile to string data:
urlData = urlFile.read()
# close file to release memory
urlFile.close()
# parse downloaded xml
domData = parseString(urlData)
# extract xml element values for stage cfs and ft
index = 0
elementCount = domData.getElementsByTagName("ns1:value").length
output = []
while elementCount >= index:
xmlString = domData.getElementsByTagName("ns1:value")[index].toxml()
output.append(stripXmlTags(xmlString))
index = index + 1
# extract and return
return output
答案 0 :(得分:0)
您可以将此作为起点 - 注意这当前忽略了时区......
from xml.etree import ElementTree as ET
tree = ET.fromstring(urlData)
for elem in tree.findall('.//{http://www.cuahsi.org/waterML/1.1/}value'):
print datetime.strptime(elem.attrib['dateTime'][:-10], '%Y-%m-%dT%H:%M:%S')
答案 1 :(得分:0)
ElementTree的iter()方法也可以方便地获取您想要的一些数据,如下所示。一些示例输出在程序之后。
#!/usr/bin/env python
from xml.etree import cElementTree as ET
from datetime import datetime
import re
with open('waterservices.usgs.gov.xml','r') as fi:
waterData = ''.join(fi.readlines())
waterData = re.sub('ns[12]:', '', waterData)
root = ET.fromstring(waterData)
dates = [v.get('dateTime') for v in root.iter('value')]
valus = [float(v.text) for v in root.iter('value')]
units = [v.text for v in root.iter('variableName')]
print 'valus', valus
print 'units', units
print 'dates', dates
dates = [datetime.strptime(t[:-6], '%Y-%m-%dT%H:%M:%S.%f') for t in dates]
print 'dates', dates
a = zip (valus, units, dates)
for v in a:
print v
(注意,我不知道如何正确处理前缀ns1:
和ns2:
,所以在上面已经通过re.sub
来抑制它们。数据是从文件中获取的在上面的演示代码中简洁而不是保留urllib2
代码。此外,如前面的答案中所述,时区未被处理。)下面的示例输出基于XML数据文件
link from question,保存为本地文件waterservices.usgs.gov.xml
。
valus [53200.0, 6.86]
units ['Streamflow, ft³/s', 'Gage height, ft']
dates ['2012-11-01T19:45:00.000-05:00', '2012-11-01T19:45:00.000-05:00']
dates [datetime.datetime(2012, 11, 1, 19, 45), datetime.datetime(2012, 11, 1, 19, 45)]
(53200.0, 'Streamflow, ft³/s', datetime.datetime(2012, 11, 1, 19, 45))
(6.86, 'Gage height, ft', datetime.datetime(2012, 11, 1, 19, 45))