如何有效地将数据附加到数组中? (Python,BS4,请求)

时间:2016-03-18 20:16:58

标签: python arrays beautifulsoup python-requests

我是python的新手,我尝试使用BeautifulSoup和Request提取天气数据。我有一个关于数组在python中的工作方式的问题,以及最好的方法是将已删除的数据插入到格式良好的结构中。

    from lxml import html
    from bs4 import BeautifulSoup
    import requests

    url = "http://climate.weather.gc.ca/climateData/hourlydata_e.html?timeframe=1&Prov=ON&StationID=31688&hlyRange=2002-06-04|2016-03-17&Year=2016&Month=3&Day=15"
    r = requests.get(url)
    page = r.content
    soup = BeautifulSoup(page, "lxml")

    table = soup.find('table', class_='wet-boew-zebra span-8 ')
    rows = table.find_all('tr')

    data = []

   for item in rows:
        data.append(item.text)
   print data

我的代码在上面,主要是我遇到的麻烦是我最初试图做的事情:

    for item in rows:
        print(item.text)

我会看到所有数据,因为它循环遍历这些项目。然后我试着这样做:

    for item in rows:
        data = (item.text)
    print data 

这只会显示表格最后一行的数据。然后我尝试追加(在上面的主要代码中)并得到这样的东西:

  

[u" \ n \ n温度定义\ xb0C \ nDew点温度定义\ xb0C \ n电压定义%\ n风速定义10'降低风速定义km / h \ n可见性定义km \ nStn按定义kPa \ nHmdx定义\ n风寒定义\ n天气定义\ n",u' TIME',u' \ n00:00 \ n9.5 \ n6.3 \ n81 \ nLegendMM \ nLegendMM \ n \ n998 \ n \ n \ nLegendNANA \ n',u' \ n01:00 \ n9.0 \ n6.4 \ n84 \ nLegendMM \ nLegendMM \ n \ n993 \ n \ n \ nLegendNANA \ n',u' \ n02:00 \ n8.0 \ n6 .5 \ n91 \ nLegendMM \ nLegendMM \ n \ n996 \ n \ n \ nLegendNANA \ n',u' \ n03:00 \ n7.6 \ n6.5 \ n92 \ nLegendMM \ nLegendMM \ n \ n99.31 \ n \ n \ nLegendNANA \ n',u' \ n04:00 \ n7.2 \ n6.1 \ n93 \ nLegendMM \ nLegendMM \ n \ n99303 \ n \ n \ nLegendNANA \ n',u' \ n05:00 \ n7.1 \ n6.2 \ n94 \​​ nLegendMM \ nLegendMM \ n \ n995 \ n \ n \ nLegendNANA \ n',u' \ n06:00 \ n7.6 \ n6.8 \ n95 \ nLegendMM \ nLegendMM \ n \ n9999 \ n \ n \ nLegendNANA \ n',u' \ n07:00 \ n7.6 \ n6 .8 \ n95 \ nLegendMM \ nLegendMM \ n \ n9940 \ n \ n \ nLegendNANA \ n',u' \ n08:00 \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n',u' \ n09:00 \ n7.3 \ n6.0 \ n92 \ nLegendMM \ nLegendMM \ n \ n9999 \ n \ n \ nLegendNANA \ n', u' \ n10:00 \ n7.4 \ n6.3 \ n92 \ nLegendMM \ nLegendMM \ n \ n99.57 \ n \ n \ nLegendNANA \ n',u' \ n11:00 \ n7 .3 \ n5.5 \ n88 \ nLegendMM \ nLegendMM \ n \ n995 \ n \ n \ nLegendNANA \ n',u' \ n12:00 \ n7.7 \ n5.2 \ n84 \ nLegendMM \ nLegendMM \ n \ n 99.61 \ n \ n \ nLegendNANA \ n',u' \ n13:00 \ n7.9 \ n4.6 \ n80 \ nLegendMM \ nLegendMM \ n \ n99503 \ n \ n \ nLegendNANA \ n&# 39;,u' \ n14:00 \ n9.6 \ n5.3 \ n75 \ nLegendMM \ nLegendMM \ n \ n996 \ n \ n \ nLegendNANA \ n',u' \ n15: 00 \ n10.0 \ n5.8 \ n75 \ nLegendMM \ nLegendMM \ n \ n995 \ n \ n \ nLegendNANA \ n',u' \ n16:00 \ n10.0 \ n5.1 \ n72 \ nLegendMM \ nLegendMM \ n \ n995 \ n \ n \ nLegendNANA \ n',u' \ n17:00 \ n9.6 \ n4.7 \ n72 \ nLegendMM \ nLegendMM \ n \ n99。 67 \ n \ n \ nLegendNANA \ n',u' \ n18:00 \ n8.6 \ n5.3 \ n80 \ nLegendMM \ nLegendMM \ n \ n99992 \ n \ n \ nLegendNANA \ n&# 39;,u' \ n19:00 \ n7.9 \ n4.1 \ n77 \ nLegendMM \ nLegendMM \ n \ n99.82 \ n \ n \ nLegendNANA \ n',u' \ n20: 00 \ n7.9 \ n4.3 \ n78 \ nLegendMM \ nLegendMM \ n \ n99.88 \ n \ n \ nLegendNANA \ n',u' \ n21:00 \ n7.7 \ n4.2 \ n79 \ nLegendMM \ nLegendMM \ n \ n999.89 \ n \ n \ nLegendNANA \ n',u' \ n22:00 \ n7.4 \ n4.0 \ n79 \ nLegendMM \ nLegendMM \ n \ n99。 86 \ n \ n \ nLegendNANA \ n',u' \ n23:00 \ n7.1 \ n4.0 \ n81 \ nLegendMM \ nLegendMM \ n \ n99.85 \ n \ n \ nLegendNANA \ n&# 39]

总结一下,将数据插入到数组中的最佳方法是什么?我可以轻松地操作/分析我所收集的数据?

1 个答案:

答案 0 :(得分:0)

使用BeautifulSoup返回的row对象提取td标记

for row in rows:
    data.append([c.text for c in row.find_all('td')])

`