使用 bs4 python 进行网页抓取:如何显示足球比赛

时间:2021-02-01 19:33:26

标签: python web beautifulsoup screen-scraping

我是 Python 的初学者,正在尝试创建一个程序,该程序将从 skysports.com 上抓取足球/足球赛程,并通过 Twilio 通过 SMS 将其发送到我的手机。我已经排除了 SMS 代码,因为我已经弄清楚了,所以这是到目前为止我遇到的网络抓取代码:

import requests
from bs4 import BeautifulSoup

URL = "https://www.skysports.com/football-fixtures"
page = requests.get(URL)

results = BeautifulSoup(page.content, "html.parser")

d = defaultdict(list)

comp = results.find('h5', {"class": "fixres__header3"})
team1 = results.find('span', {"class": "matches__item-col matches__participant matches__participant--side1"})
date = results.find('span', {"class": "matches__date"})
team2 = results.find('span', {"class": "matches__item-col matches__participant matches__participant--side2"})

for ind in range(len(d)):
    d['comp'].append(comp[ind].text)
    d['team1'].append(team1[ind].text)
    d['date'].append(date[ind].text)
    d['team2'].append(team2[ind].text) 

1 个答案:

答案 0 :(得分:1)

下面应该可以为您解决问题:

   from bs4 import BeautifulSoup
   import requests
    
    a = requests.get('https://www.skysports.com/football-fixtures')
    soup = BeautifulSoup(a.text,features="html.parser")
    
    teams = []
    for date in soup.find_all(class_="fixres__header2"): # searching in that date
        for i in soup.find_all(class_="swap-text--bp30")[1:]: #skips the first one because that's a heading
            teams.append(i.text)
    
    date = soup.find(class_="fixres__header2").text
    print(date)
    teams = [i.strip('\n') for i in teams]
    for x in range(0,len(teams),2):
        print (teams[x]+" vs "+ teams[x+1])

让我进一步解释我所做的: 所有的足球都有这个类名 - swap-text--bp30 enter image description here

因此我们可以使用 find_all 提取具有该名称的所有类。

获得结果后,我们可以将它们放入数组“teams = []”中,然后将它们附加到 for 循环“team.append(i.text)”中。 ".text" 去除 html

然后我们可以通过剥离它并两两打印出数组中的每个字符串来摆脱数组中的“\n”。 这应该是您的最终输出:

enter image description here

编辑:为了获得联赛的冠军,我们会做几乎相同的事情:

league = []
for date in soup.find_all(class_="fixres__header2"): # searching in that date
    for i in soup.find_all(class_="fixres__header3"): #skips the first one because that's a heading
        league.append(i.text)

剥离数组并创建另一个:

league = [i.strip('\n') for i in league]
final = []

然后添加最后一点代码,它基本上只是一遍又一遍地打印联赛然后两支球队:

for x in range(0,len(teams),5):
    final.append(teams[x]+" vs "+ teams[x+1])

for i in league:
    print(i)
    for i in final:
        print(i)
相关问题