Question

我遇到了问题。我有一个正则表达式，通过rss feed查看天气

url = 'http://rss.weatherzone.com.au/?u=12994-1285&lt=aploc&lc=9388&obs=1&fc=1&warn=1'
weather_brisbane = urlopen(url)
html_code = weather_brisbane.read()
weather_brisbane.close()

我有正则表达式：

weather_contents = findall('<b>(.+)</b> (.*)', html_code)
if weather_contents != []:
    print 'Contents'
    for section_heading in weather_contents:
        print section_heading 
    print

我得到了这个结果：

Contents
('Temperature:', '20.1&#176;C\r')
('Feels like:', '20.1&#176;C<br />\r')
('Dew point:', '13.6&#176;C\r')
('Relative humidity:', '66%<br />\r')
('Wind:', 'E at 2 km/h, gusting to 4 km/h\r')
('Rain:', '0.0mm since 9am<br />\r')
('Pressure:', '1024.9 hPa\r')

所以我的问题是，有没有办法得到这个结果：

Contents
Temperature: 20.1
Feels like: 20.1
Dew point: 13.6
Relative humidity: 66%
Wind: E at 2 km/h, gusting to 4 km/h
Rain: 0.0mm since 9am
Pressure: 1024.9 hPa

通过将strip（）函数集成到现有代码中。

Answer 1

您获得的输出似乎是html编码。

使用html解码器将：Decode HTML entities in Python string?

所以使用这段代码：

from HTMLParser import HTMLParser
h = HTMLParser()
weather_contents = findall('<b>(.+)</b> (.*)', html_code)
if weather_contents != []:
    print 'Contents'
    for section_heading in weather_contents:
        print section_heading[0], h.unescape(section_heading[1]) 
    print

我认为这会显示您想要显示的内容。

Answer 2

还有HTMLParser的替代方案：

print ' '.join([s.rstrip('\r').rsplit('<br />')[0].rsplit('&#176;C')[0] for s in section_heading])

而不是

print section_heading

Answer 3

weather_contents = [x.replace('&#176;C', "C") for x in weather_contents]

这应该有助于改进你的weather_contents

使用条带函数删除由正则表达式获取的字符串的一部分

3 个答案: