Question

下午好，社区！我需要编写解析器的帮助，我才刚刚开始使用Python 3进行编程，也许我缺少了一些东西。任务是这样的：该站点有一个包含橄榄球队的表格，使用Requests和BeautifulSoup，我能够将此表格的源代码转换为firsttable变量，print命令通常显示我需要的所有数据，但是当我尝试将其显示在列表中时形式：

10:00 Team 1 Team 2
11:00 Team 3 Team 4
12:00 Team 5 Team 6

依此类推，我只能从列表中获取第一个值，我尝试使用While循环（例如，While i <10），它向我重复了表中的第一个值10次，但是不解析其余的。我在做什么错了？

def get_data(html):
    soup = BeautifulSoup(html, 'lxml')
    firsttable = soup.findAll('table', class_='predictionsTable')[0]
    print(firsttable) #Here, all the data that I need is displayed in the console as html source

    for scrap in firsttable:

        try:
            hometeam = scrap.find('td', class_='COL-3').text
        except:
            hometeam = 'Hometeam Error'

        try:
            awayteam = scrap.find('td', class_='COL-5').text
        except:
            awayteam = 'Away Team Error'

        try:
            btts = scrap.find('td', class_='COL-10').text
        except:
            btts = 'BTTS Score Error'

        datenow = str(datetime.date.today())

        print(datenow,hometeam,awayteam,btts)

Answer 1

BeautifulSoup的构造函数的第二个参数是String。这是解析器的一种。
您要解析html，因此应在第二个参数中键入“ html.parser”。

soup = BeautifulSoup(html, 'lxml') => soup = BeautifulSoup(html, 'html.parser')

Answer 2

循环for scrap in firsttable仅对整个表内容进行一次迭代，这就是为什么只查找第一行的原因。我不建议使用循环，而推荐使用find_all方法。这对我有用：

url = 'https://www.over25tips.com/both-teams-to-score-tips/'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

firsttable = soup.findAll('table', class_='predictionsTable')[0]
hometeams = [x.text for x in firsttable.find_all('td', {'class': 'COL-3 right-align'})]
awayteams = [x.text for x in firsttable.find_all('td', {'class': 'COL-5 left-align'})]
btts = [x.text for x in firsttable.find_all('td', {'class': 'COL-10 hide-on-small-only'})]
datenow = str(datetime.date.today())

for i in range(len(hometeams)):
    print(datenow, hometeams[i], awayteams[i], btts[i])

如何获取表的所有行，而不仅仅是第一行？

2 个答案: