Question

我正在尝试使用嵌套循环从此站点获取所有表。我几乎在那里，但仍然不确定具有相同类标识符的几个表的循环。我收到了line 26 : for s in soup.findALL ("table", { "class" : "boxScore"})

的错误代码

SyntaxError：语法无效。

我的剧本：

import datetime
import urllib
from bs4 import BeautifulSoup
import urllib2


day = int(datetime.datetime.now().strftime("%d"))-1

month = datetime.datetime.now().strftime("%B")
year = datetime.datetime.now().strftime("%Y")
file_name = "/users/ripple/NHL.csv"
file = open(file_name,"w")
url = "http://www.tsn.ca/nhl/scores/?date=" + month + "/" + str(day) + "/" + year
print 'Grabbing from: ' + url + '...\n'
try:
        r = urllib2.urlopen(url)
except urllib2.URLError as e:
           r = e
if r.code in (200, 401):    
    #get the table data from the page
    data = urllib.urlopen(url).read()
    #send to beautiful soup
    soup = BeautifulSoup(data)
    print soup
    soup = soup.findALL ("table", { "class" : "boxScore"})
    for s in soup.findALL ("table", { "class" : "boxScore"})
        table = soup.find("table",{ "class" : "boxScore"})
        for tr in table.findAll('tr')[2:]:
            col = tr.findAll('td')
            team = col[0].get_text().encode('ascii','ignore').replace(" ","")
            firstp = col[1].get_text().encode('ascii','ignore').replace(" ","")
            secondp = col[2].get_text().encode('ascii','ignore').replace(" ","")
            thirdp = col[3].get_text().encode('ascii','ignore').replace(" ","")
            final = col[4].get_text().encode('ascii','ignore').replace(" ","")
            record = team + ',' + final + '\n'
            print record
            file.write(record)
else: 
    print str(i) + " NO GAMES"
file.close()

Answer 1

对于Python中的循环，以冒号'：'结尾。

另外：API方法是findAll（）而不是findALL（）。

使用嵌套循环使用beautifulsoup从HTML获取所有表

1 个答案: