使用条带函数删除由正则表达式获取的字符串的一部分

时间:2014-05-19 08:26:13

标签: python regex string python-2.7 strip

我遇到了问题。我有一个正则表达式,通过rss feed查看天气

url = 'http://rss.weatherzone.com.au/?u=12994-1285&lt=aploc&lc=9388&obs=1&fc=1&warn=1'
weather_brisbane = urlopen(url)
html_code = weather_brisbane.read()
weather_brisbane.close()

我有正则表达式:

weather_contents = findall('<b>(.+)</b> (.*)', html_code)
if weather_contents != []:
    print 'Contents'
    for section_heading in weather_contents:
        print section_heading 
    print

我得到了这个结果:

Contents
('Temperature:', '20.1&#176;C\r')
('Feels like:', '20.1&#176;C<br />\r')
('Dew point:', '13.6&#176;C\r')
('Relative humidity:', '66%<br />\r')
('Wind:', 'E at 2 km/h, gusting to 4 km/h\r')
('Rain:', '0.0mm since 9am<br />\r')
('Pressure:', '1024.9 hPa\r')​

所以我的问题是,有没有办法得到这个结果:

Contents
Temperature: 20.1
Feels like: 20.1
Dew point: 13.6
Relative humidity: 66%
Wind: E at 2 km/h, gusting to 4 km/h
Rain: 0.0mm since 9am
Pressure: 1024.9 hPa

通过将strip()函数集成到现有代码中。

3 个答案:

答案 0 :(得分:1)

您获得的输出似乎是html编码。

使用html解码器将:Decode HTML entities in Python string?

所以使用这段代码:

from HTMLParser import HTMLParser
h = HTMLParser()
weather_contents = findall('<b>(.+)</b> (.*)', html_code)
if weather_contents != []:
    print 'Contents'
    for section_heading in weather_contents:
        print section_heading[0], h.unescape(section_heading[1]) 
    print

我认为这会显示您想要显示的内容。

答案 1 :(得分:1)

还有HTMLParser的替代方案:

print ' '.join([s.rstrip('\r').rsplit('<br />')[0].rsplit('&#176;C')[0] for s in section_heading])

而不是

print section_heading

答案 2 :(得分:0)

weather_contents = [x.replace('&#176;C', "C") for x in weather_contents]

这应该有助于改进你的weather_contents