我正在使用谷歌网站检索天气信息,我想在XML标签之间找到值。下面的代码给我一个城市的天气状况,但我无法获得其他参数,如温度,如果可能的话,解释代码中隐含的拆分函数的工作:
import urllib
def getWeather(city):
#create google weather api url
url = "http://www.google.com/ig/api?weather=" + urllib.quote(city)
try:
# open google weather api url
f = urllib.urlopen(url)
except:
# if there was an error opening the url, return
return "Error opening url"
# read contents to a string
s = f.read()
# extract weather condition data from xml string
weather = s.split("<current_conditions><condition data=\"")[-1].split("\"")[0]
# if there was an error getting the condition, the city is invalid
if weather == "<?xml version=":
return "Invalid city"
#return the weather condition
return weather
def main():
while True:
city = raw_input("Give me a city: ")
weather = getWeather(city)
print(weather)
if __name__ == "__main__":
main()
谢谢
答案 0 :(得分:8)
您无法使用正则表达式解析XML,因此请勿尝试。这是一个start to finding an XML parser in Python。这是一个good site for learning about parsing XML in Python。
更新:鉴于有关PyS60的新信息,这是来自诺基亚网站的documentation for using XML。
更新2:@Nas Banov请求了示例代码,所以这里是:
import urllib
from xml.parsers import expat
def start_element_handler(name, attrs):
"""
My handler for the event that fires when the parser sees an
opening tag in the XML.
"""
# If we care about more than just the temp data, we can extend this
# logic with ``elif``. If the XML gets really hairy, we can create a
# ``dict`` of handler functions and index it by tag name, e.g.,
# { 'humidity': humidity_handler }
if 'temp_c' == name:
print "The current temperature is %(data)s degrees Celsius." % attrs
def process_weather_conditions():
"""
Main logic of the POC; set up the parser and handle resource
cleanup.
"""
my_parser = expat.ParserCreate()
my_parser.StartElementHandler = start_element_handler
# I don't know if the S60 supports try/finally, but that's not
# the point of the POC.
try:
f = urllib.urlopen("http://www.google.com/ig/api?weather=30096")
my_parser.ParseFile(f)
finally:
f.close()
if __name__ == '__main__':
process_weather_conditions()
答案 1 :(得分:4)
我建议使用XML Parser,就像Hank Gay建议的那样。我的个人建议是lxml,因为我目前正在项目中使用它,它扩展了标准库(xml.etree)中已经存在的非常有用的ElementTree接口。
Lxml包括对xpath,xslt以及标准ElementTree模块中缺少的各种其他功能的附加支持。
无论您选择哪种方式,XML解析器都是最佳选择,因为您将能够将XML文档作为Python对象处理。这意味着您的代码将类似于:
# existing code up to...
s = f.read()
import lxml.etree as ET
tree = ET.parse(s)
current = tree.find("current_condition/condition")
condition_data = current.get("data")
weather = condition_data
return weather
答案 2 :(得分:2)
XML是结构化数据。您可以比使用字符串操作从其中获取数据更好地更多。标准库中有sax,dom和elementree模块以及高质量lxml库,可以更可靠的方式为您工作
答案 3 :(得分:0)
嗯,这里是 - 针对特定案例的非完整解析器解决方案:
import urllib
def getWeather(city):
''' given city name or postal code,
return dictionary with current weather conditions
'''
url = 'http://www.google.com/ig/api?weather='
try:
f = urllib.urlopen(url + urllib.quote(city))
except:
return "Error opening url"
s = f.read().replace('\r','').replace('\n','')
if '<problem' in s:
return "Problem retreaving weather (invalid city?)"
weather = s.split('</current_conditions>')[0] \
.split('<current_conditions>')[-1] \
.strip('</>')
wdict = dict(i.split(' data="') for i in weather.split('"/><'))
return wdict
和使用示例:
>>> weather = getWeather('94043')
>>> weather
{'temp_f': '67', 'temp_c': '19', 'humidity': 'Humidity: 61%', 'wind_condition': 'Wind: N at 21 mph', 'condition': 'Sunny', 'icon': '/ig/images/weather/sunny.gif'}
>>> weather['humidity']
'Humidity: 61%'
>>> print '%(condition)s\nTemperature %(temp_c)s C (%(temp_f)s F)\n%(humidity)s\n%(wind_condition)s' % weather
Sunny
Temperature 19 C (67 F)
Humidity: 61%
Wind: N at 21 mph
PS。请注意,谷歌输出格式的一个相当微不足道的变化将打破这一点 - 比如他们是在标签或属性之间添加额外的空格或标签。他们避免减少http响应的大小。但如果他们这样做了,我们必须熟悉正则表达式和re.split()
PPS。文档中解释了str.split(sep)
如何工作,这里有一段摘录:使用sep作为分隔符字符串,返回字符串中单词的列表。 ... sep参数可以包含多个字符(例如,'1&lt;&gt;&lt;&gt;&gt;&gt;'&gt;'&lt;&gt;')返回['1','2','3'] )。因此'text1<tag>text2</tag>text3'.split('</tag>')
为我们提供了['text1<tag>text2', 'text3']
,然后[0]
获取了第一个元素'text1<tag>text2'
,然后我们分开并选取包含我们感兴趣的数据的'text2'。相当真实。