Web刮天气表

时间:2018-05-19 08:19:50

标签: python pandas web-scraping beautifulsoup

我正在尝试通过网络抓取天气数据,并需要将表格转换为csv格式。但并非表中的所有条目都填充了相同数量的列。所以当我以这种格式输入

for h in airports:

    for i in range(1,3):
          if(i==1):
              for j in range(1,32):
                  url="https://www.wunderground.com/history/airport/"+str(h)+"/2018/"+str(i)+"/"+str(j)+"/DailyHistory.html?req_city=&req_state=&req_statename=&reqdb.zip=&reqdb.magic=&reqdb.wmo="
                  www= urllib3.PoolManager()
                  page=www.urlopen("GET",url)
                  bs= BeautifulSoup(page.data,"lxml")
                  x=bs.find('div',{"class":"high-res"})
                  for tr in x.findAll('tr'):
                         weather.append([td for td in tr.stripped_strings])

          else: 
              for k in range(1,29):
                  url="https://www.wunderground.com/history/airport/"+str(h)+"/2018/"+str(i)+"/"+str(k)+"/DailyHistory.html?req_city=&req_state=&req_statename=&reqdb.zip=&reqdb.magic=&reqdb.wmo="
                  www= urllib3.PoolManager()
                  page=www.urlopen("GET",url)
                  bs= BeautifulSoup(page.data,"lxml")
                  x=bs.find('div',{"class":"high-res"})
                  for tr in x.findAll('tr'):
                          weather.append([td for td in tr.stripped_strings])

输出csv文件到处都是,逗号分隔变量每个都进入一个新列而不管标题。 有一个简单的方法来做到这一点,并以更清晰的方式得到日期? The table is in this format.

所以我不断添加包含表格行的列表,而不管列数。如何确保列中的数据位于正确的标题下?This is the csv output i got.

这是我用来将数据写入csv文件的原因:

with open ('weather.csv','a') as file:
   writer=csv.writer(file)
   for row in weather:
      writer.writerow(row)  

1 个答案:

答案 0 :(得分:0)

所以以下人员似乎解决了我在正确的列标题下获取正确数据的问题:

for tr in x.findAll('tr'):
                     cols=tr.findAll('td')
                     cols=[ele.text.strip() for ele in cols]
                     weather.append([ele for ele in cols if ele])

result=pd.DataFrame(weather,columns=["Time(EST)","Temp.","Windchill","Dew Point","Humidity","Pressure","Visibility","Wind Dir","Wind Speed","Gust Speed","Precip","Events","Conditions"])
  

但是我遇到了一个新问题,即当我删除文本时会有一些问题   表中缺少值,代码忽略并继续   填写错误的标题。请帮忙