我是python的新手,我尝试使用BeautifulSoup和Request提取天气数据。我有一个关于数组在python中的工作方式的问题,以及最好的方法是将已删除的数据插入到格式良好的结构中。
from lxml import html
from bs4 import BeautifulSoup
import requests
url = "http://climate.weather.gc.ca/climateData/hourlydata_e.html?timeframe=1&Prov=ON&StationID=31688&hlyRange=2002-06-04|2016-03-17&Year=2016&Month=3&Day=15"
r = requests.get(url)
page = r.content
soup = BeautifulSoup(page, "lxml")
table = soup.find('table', class_='wet-boew-zebra span-8 ')
rows = table.find_all('tr')
data = []
for item in rows:
data.append(item.text)
print data
我的代码在上面,主要是我遇到的麻烦是我最初试图做的事情:
for item in rows:
print(item.text)
我会看到所有数据,因为它循环遍历这些项目。然后我试着这样做:
for item in rows:
data = (item.text)
print data
这只会显示表格最后一行的数据。然后我尝试追加(在上面的主要代码中)并得到这样的东西:
[u" \ n \ n温度定义\ xb0C \ nDew点温度定义\ xb0C \ n电压定义%\ n风速定义10'降低风速定义km / h \ n可见性定义km \ nStn按定义kPa \ nHmdx定义\ n风寒定义\ n天气定义\ n",u' TIME',u' \ n00:00 \ n9.5 \ n6.3 \ n81 \ nLegendMM \ nLegendMM \ n \ n998 \ n \ n \ nLegendNANA \ n',u' \ n01:00 \ n9.0 \ n6.4 \ n84 \ nLegendMM \ nLegendMM \ n \ n993 \ n \ n \ nLegendNANA \ n',u' \ n02:00 \ n8.0 \ n6 .5 \ n91 \ nLegendMM \ nLegendMM \ n \ n996 \ n \ n \ nLegendNANA \ n',u' \ n03:00 \ n7.6 \ n6.5 \ n92 \ nLegendMM \ nLegendMM \ n \ n99.31 \ n \ n \ nLegendNANA \ n',u' \ n04:00 \ n7.2 \ n6.1 \ n93 \ nLegendMM \ nLegendMM \ n \ n99303 \ n \ n \ nLegendNANA \ n',u' \ n05:00 \ n7.1 \ n6.2 \ n94 \ nLegendMM \ nLegendMM \ n \ n995 \ n \ n \ nLegendNANA \ n',u' \ n06:00 \ n7.6 \ n6.8 \ n95 \ nLegendMM \ nLegendMM \ n \ n9999 \ n \ n \ nLegendNANA \ n',u' \ n07:00 \ n7.6 \ n6 .8 \ n95 \ nLegendMM \ nLegendMM \ n \ n9940 \ n \ n \ nLegendNANA \ n',u' \ n08:00 \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n',u' \ n09:00 \ n7.3 \ n6.0 \ n92 \ nLegendMM \ nLegendMM \ n \ n9999 \ n \ n \ nLegendNANA \ n', u' \ n10:00 \ n7.4 \ n6.3 \ n92 \ nLegendMM \ nLegendMM \ n \ n99.57 \ n \ n \ nLegendNANA \ n',u' \ n11:00 \ n7 .3 \ n5.5 \ n88 \ nLegendMM \ nLegendMM \ n \ n995 \ n \ n \ nLegendNANA \ n',u' \ n12:00 \ n7.7 \ n5.2 \ n84 \ nLegendMM \ nLegendMM \ n \ n 99.61 \ n \ n \ nLegendNANA \ n',u' \ n13:00 \ n7.9 \ n4.6 \ n80 \ nLegendMM \ nLegendMM \ n \ n99503 \ n \ n \ nLegendNANA \ n&# 39;,u' \ n14:00 \ n9.6 \ n5.3 \ n75 \ nLegendMM \ nLegendMM \ n \ n996 \ n \ n \ nLegendNANA \ n',u' \ n15: 00 \ n10.0 \ n5.8 \ n75 \ nLegendMM \ nLegendMM \ n \ n995 \ n \ n \ nLegendNANA \ n',u' \ n16:00 \ n10.0 \ n5.1 \ n72 \ nLegendMM \ nLegendMM \ n \ n995 \ n \ n \ nLegendNANA \ n',u' \ n17:00 \ n9.6 \ n4.7 \ n72 \ nLegendMM \ nLegendMM \ n \ n99。 67 \ n \ n \ nLegendNANA \ n',u' \ n18:00 \ n8.6 \ n5.3 \ n80 \ nLegendMM \ nLegendMM \ n \ n99992 \ n \ n \ nLegendNANA \ n&# 39;,u' \ n19:00 \ n7.9 \ n4.1 \ n77 \ nLegendMM \ nLegendMM \ n \ n99.82 \ n \ n \ nLegendNANA \ n',u' \ n20: 00 \ n7.9 \ n4.3 \ n78 \ nLegendMM \ nLegendMM \ n \ n99.88 \ n \ n \ nLegendNANA \ n',u' \ n21:00 \ n7.7 \ n4.2 \ n79 \ nLegendMM \ nLegendMM \ n \ n999.89 \ n \ n \ nLegendNANA \ n',u' \ n22:00 \ n7.4 \ n4.0 \ n79 \ nLegendMM \ nLegendMM \ n \ n99。 86 \ n \ n \ nLegendNANA \ n',u' \ n23:00 \ n7.1 \ n4.0 \ n81 \ nLegendMM \ nLegendMM \ n \ n99.85 \ n \ n \ nLegendNANA \ n&# 39]
总结一下,将数据插入到数组中的最佳方法是什么?我可以轻松地操作/分析我所收集的数据?
答案 0 :(得分:0)
使用BeautifulSoup返回的row
对象提取td
标记
for row in rows:
data.append([c.text for c in row.find_all('td')])
`