我遇到了问题。我有一个正则表达式,通过rss feed查看天气
url = 'http://rss.weatherzone.com.au/?u=12994-1285<=aploc&lc=9388&obs=1&fc=1&warn=1'
weather_brisbane = urlopen(url)
html_code = weather_brisbane.read()
weather_brisbane.close()
我有正则表达式:
weather_contents = findall('<b>(.+)</b> (.*)', html_code)
if weather_contents != []:
print 'Contents'
for section_heading in weather_contents:
print section_heading
print
我得到了这个结果:
Contents
('Temperature:', '20.1°C\r')
('Feels like:', '20.1°C<br />\r')
('Dew point:', '13.6°C\r')
('Relative humidity:', '66%<br />\r')
('Wind:', 'E at 2 km/h, gusting to 4 km/h\r')
('Rain:', '0.0mm since 9am<br />\r')
('Pressure:', '1024.9 hPa\r')
所以我的问题是,有没有办法得到这个结果:
Contents
Temperature: 20.1
Feels like: 20.1
Dew point: 13.6
Relative humidity: 66%
Wind: E at 2 km/h, gusting to 4 km/h
Rain: 0.0mm since 9am
Pressure: 1024.9 hPa
通过将strip()函数集成到现有代码中。
答案 0 :(得分:1)
您获得的输出似乎是html编码。
使用html解码器将:Decode HTML entities in Python string?
所以使用这段代码:
from HTMLParser import HTMLParser
h = HTMLParser()
weather_contents = findall('<b>(.+)</b> (.*)', html_code)
if weather_contents != []:
print 'Contents'
for section_heading in weather_contents:
print section_heading[0], h.unescape(section_heading[1])
print
我认为这会显示您想要显示的内容。
答案 1 :(得分:1)
还有HTMLParser的替代方案:
print ' '.join([s.rstrip('\r').rsplit('<br />')[0].rsplit('°C')[0] for s in section_heading])
而不是
print section_heading
答案 2 :(得分:0)
weather_contents = [x.replace('°C', "C") for x in weather_contents]
这应该有助于改进你的weather_contents