使用嵌套循环使用beautifulsoup从HTML获取所有表

时间:2013-04-12 16:45:52

标签: python loops web-scraping beautifulsoup

我正在尝试使用嵌套循环从此站点获取所有表。我几乎在那里,但仍然不确定具有相同类标识符的几个表的循环。我收到了line 26 : for s in soup.findALL ("table", { "class" : "boxScore"})

的错误代码
  

SyntaxError:语法无效。

我的剧本:

import datetime
import urllib
from bs4 import BeautifulSoup
import urllib2


day = int(datetime.datetime.now().strftime("%d"))-1

month = datetime.datetime.now().strftime("%B")
year = datetime.datetime.now().strftime("%Y")
file_name = "/users/ripple/NHL.csv"
file = open(file_name,"w")
url = "http://www.tsn.ca/nhl/scores/?date=" + month + "/" + str(day) + "/" + year
print 'Grabbing from: ' + url + '...\n'
try:
        r = urllib2.urlopen(url)
except urllib2.URLError as e:
           r = e
if r.code in (200, 401):    
    #get the table data from the page
    data = urllib.urlopen(url).read()
    #send to beautiful soup
    soup = BeautifulSoup(data)
    print soup
    soup = soup.findALL ("table", { "class" : "boxScore"})
    for s in soup.findALL ("table", { "class" : "boxScore"})
        table = soup.find("table",{ "class" : "boxScore"})
        for tr in table.findAll('tr')[2:]:
            col = tr.findAll('td')
            team = col[0].get_text().encode('ascii','ignore').replace(" ","")
            firstp = col[1].get_text().encode('ascii','ignore').replace(" ","")
            secondp = col[2].get_text().encode('ascii','ignore').replace(" ","")
            thirdp = col[3].get_text().encode('ascii','ignore').replace(" ","")
            final = col[4].get_text().encode('ascii','ignore').replace(" ","")
            record = team + ',' + final + '\n'
            print record
            file.write(record)
else: 
    print str(i) + " NO GAMES"
file.close()

1 个答案:

答案 0 :(得分:2)

对于Python中的循环,以冒号':'结尾。

另外:API方法是findAll()而不是findALL()。