从以下网站解析表格

时间:2017-11-14 06:22:13

标签: python-2.7 beautifulsoup html-parsing

我想收集2016年每一天印度特定城市的过去天气详情。以下网站有这些数据:

https://www.timeanddate.com/weather/india/kanpur/historic?month=1&year=2016

此链接包含2016年1月的数据。有一个很好的表格

I want to extract this table

I have tried enough and I could extract another table which is this one. BUT I DO NOT WANT THIS ONE. It is not serving my purpose

我希望其他大表能够及时提供数据。 “对于那个月的每一天”因为那时我可以使用URL循环所有月份。

问题是我不知道html和与之相关的东西。所以我自己无法刮掉东西。

1 个答案:

答案 0 :(得分:1)

如果您提供了一些您尝试过的代码,那会更好。无论如何,这段代码适用于Jan 1月表。您也可以编写循环以提取其他日期的数据。

from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.timeanddate.com/weather/india/kanpur/historic?
month=1&year=2016"
page = urlopen(url)
soup = BeautifulSoup(page, 'lxml')

Data = []
table = soup.find('table', attrs={'id':'wt-his'})
for tr in table.find('tbody').find_all('tr'):
   dict = {}
   dict['time'] = tr.find('th').text.strip()
   all_td = tr.find_all('td')
   dict['temp'] = all_td[1].text
   dict['weather'] = all_td[2].text
   dict['wind'] = all_td[3].text
   arrow = all_td[4].text
   if arrow == '↑':
      dict['wind_dir'] = 'South to North'
   else: 
      dict['wind_dir'] = 'North to South'

   dict['humidity'] = all_td[5].text
   dict['barometer'] = all_td[6].text
   dict['visibility'] = all_td[7].text

   Data.append(dict)

注意:为wind_dir逻辑添加其他案例