使用相同的span类和嵌套跨度检索bbc天气数据

时间:2015-04-15 22:04:30

标签: python beautifulsoup

我正试图从BBC天气中提取数据,以便在家庭自动化仪表板中使用。

HTML代码我可以拉得很好,我可以拉一组临时数,但它只是拉第一个。

</li>
<li class="daily__day-tab day-20150418 ">
<a data-ajax-href="/weather/en/2646504/daily/2015-04-18?day=3" href="/weather/2646504?day=3" rel="nofollow">
<div class="daily__day-header">
<h3 class="daily__day-date">
<span aria-label="Saturday" class="day-name">Sat</span>
</h3>
</div>
<span class="weather-type-image weather-type-image-40" title="Sunny"><img alt="Sunny" src="http://static.bbci.co.uk/weather/0.5.327/images/icons/tab_sprites/40px/1.png"/></span>
<span class="max-temp max-temp-value"> <span class="units-values temperature-units-values"><span class="units-value temperature-value temperature-value-unit-c" data-unit="c">13<span class="unit">°C</span></span><span class="unit-types-separator"> </span><span class="units-value temperature-value temperature-value-unit-f" data-unit="f">55<span class="unit">°F</span></span></span></span>
<span class="min-temp min-temp-value"> <span class="units-values temperature-units-values"><span class="units-value temperature-value temperature-value-unit-c" data-unit="c">5<span class="unit">°C</span></span><span class="unit-types-separator"> </span><span class="units-value temperature-value temperature-value-unit-f" data-unit="f">41<span class="unit">°F</span></span></span></span>
<span class="wind wind-speed windrose-icon windrose-icon--average windrose-icon-40 windrose-icon-40--average wind-direction-ene" data-tooltip-kph="31 km/h, East North Easterly" data-tooltip-mph="19 mph, East North Easterly" title="19 mph, East North Easterly">
<span class="speed"> <span class="wind-speed__description wind-speed__description--average">Wind Speed</span>
<span class="units-values windspeed-units-values"><span class="units-value windspeed-value windspeed-value-unit-kph" data-unit="kph">31 <span class="unit">km/h</span></span><span class="unit-types-separator"> </span><span class="units-value windspeed-value windspeed-value-unit-mph" data-unit="mph">19 <span class="unit">mph</span></span></span></span>
<span class="description blq-hide">East North Easterly</span>
</span>

这是我的代码无效

import urllib2
import pprint

from bs4 import BeautifulSoup
htmlFile=urllib2.urlopen('http://www.bbc.co.uk/weather/2646504?day=1')
htmlData = htmlFile.read()
soup = BeautifulSoup(htmlData)

table=soup.find("div","daily-window")
temperatures=[str(tem.contents[0]) for tem in table.find_all("span",class_="units-value temperature-value temperature-value-unit-c")]
mintemp=[str(min.contents[0]) for min in table.find_("span",class_="min-temp min-temp-value")]
maxtemp=[str(min.contents[0]) for min in table.find_all("span",class_="max-temp max-temp-value")]
windspeeds=[str(speed.contents[0]) for speed in table.find_all("span",class_="units-value windspeed-value windspeed-value-unit-mph")]


pprint.pprint(zip(temperatures,temp2,windspeeds))

1 个答案:

答案 0 :(得分:0)

你的最小和最大临时提取是错误的。你只是找到了最小临时跨度(包括c和f格式)。获取内容的第一件事就是给你空字符串。

最小临时标记标识class=min-temp.min-temp-value与c-type min temp class=temperature-value-unit-c不同。所以我建议您使用css选择器。

例如,找到你所有的最小临时跨度可能是

table.select('span.min-temp.min-temp-value span.temperature-value-unit-c')

这意味着选择class=temperature-value-unit-c跨度的所有class=min-temp min-temp-value跨度。

其他信息列表也是如此,例如max_temp wind