Python:XML从URL检索到CSV

时间:2017-10-05 02:44:54

标签: python xml csv xml-parsing elementtree

我正在尝试编写一个动态从URL中读取XML数据的Python脚本(例如http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72

XML的格式如下:

<station id="KCQT" name="Los Angeles / USC Campus Downtown" elev="179" lat="34.02355" lon="-118.29122" provider="NWS/FAA">
<ob time="04 Oct 7:10 pm" utime="1507169400">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
<ob time="04 Oct 7:05 pm" utime="1507169100">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
<ob time="04 Oct 7:00 pm" utime="1507168800">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
<ob time="04 Oct 6:55 pm" utime="1507168500">
<variable var="T" description="Temp" unit="F" value="61"/>
<variable var="TD" description="Dewp" unit="F" value="39"/>
<variable var="RH" description="Relh" unit="%" value="45"/>
</ob>
</station>

我只想检索所有可用日期的时间戳和小数点温度(&#34; Temp&#34;)(包括超过4个)。

输出应该是CSV格式的文本文件,其中时间戳和温度值每行打印一对。

以下是我对代码的尝试(这很糟糕,根本不起作用):

import requests

weatherXML = requests.get("http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72")

import xml.etree.ElementTree as ET
import csv

tree = ET.parse(weatherXML)
root = tree.getroot()

# open file for writing
Time_Temp = open('timestamp_temp.csv', 'w')

#csv writer object
csvwriter = csv.writer(Time_Temp)
time_temp = []

count = 0
for member in root.findall('ob'):
    if count == 0:
        temperature = member.find('T').var
        time_temp.append(temperature)
        csvwriter.writerow(time_temp)
        count = count + 1

    temperature = member.find('T').text
    time_temp.append(temperature)

Time_Temp.close()

请帮忙。

2 个答案:

答案 0 :(得分:0)

您可以先迭代元素ob,获取元素time的属性ob,找到varT的元素变量并获取温度元素value,将它们附加到列表中,并将其写入csv文件:

import xml.etree.ElementTree as ET
import csv
tree = ET.parse('getobextXml.php.xml')
root = tree.getroot()
# open file for writing
with open('timestamp_temp.csv', 'wb') as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerow(["Time","Temp"])
    for ob in root.iter('ob'): 
        time_temp = []
        timestamp = ob.get('time') #get the attribute time of element ob
        temp = ob.find("./variable[@var='T']").get('value') #find element variable which var is T, and get the element value
        time_temp.append(timestamp)
        time_temp.append(temp)
        csvwriter.writerow(time_temp) 

之后你会发现timestamp_temp.csv会给你结果:

Time,Temp
04 Oct 8:47 pm,68
04 Oct 7:47 pm,68
04 Oct 6:47 pm,70
04 Oct 5:47 pm,74
04 Oct 4:47 pm,75
04 Oct 3:47 pm,75
04 Oct 2:47 pm,77
04 Oct 1:47 pm,78
04 Oct 12:47 pm,78
04 Oct 11:47 am,76
04 Oct 10:47 am,74
04 Oct 9:47 am,72
...

答案 1 :(得分:0)

假设Python 3,这将有效。如果需要,我注意到Python 2的区别:

import xml.etree.ElementTree as ET
import requests
import csv

weatherXML = requests.get("http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72")
root = ET.fromstring(weatherXML.text)

# Use this with Python 2
# with open('timestamp_temp.csv','wb') as Time_Temp:

with open('timestamp_temp.csv','w',newline='') as Time_Temp:
    csvwriter = csv.writer(Time_Temp)
    csvwriter.writerow(['Time','Temp'])
    for member in root.iterfind('ob'):
        date = member.attrib['time']
        temp = member.find("variable[@var='T']").attrib['value']
        csvwriter.writerow([date,temp])

输出:

Time,Temp
04 Oct 11:47 pm,65
04 Oct 10:47 pm,66
04 Oct 9:47 pm,68
04 Oct 8:47 pm,68
04 Oct 7:47 pm,68
04 Oct 6:47 pm,70
04 Oct 5:47 pm,74
04 Oct 4:47 pm,75
   .
   .