我正在尝试使用 Python Beautiful 汤来抓取。我想从英超联赛中获取数据,请参阅下面的页面。在此代码之后,我的返回与网站中的数据不匹配。 请查看并提供帮助。我怀疑这可能是由于分页造成的 - 我想提取关于 'wins' 的 EPL 2017/18 数据。
from bs4 import BeautifulSoup
import requests
import json
url = "https://www.premierleague.com/stats/top/clubs/wins?se=79T"
data = requests.get(url).text
soup = BeautifulSoup(data, "html.parser")
PLtable = soup.find_all('table')[0]
data = []
for td in PLtable.find_all("td"):
data.append(td.text.replace('\n', ' ').strip())
答案 0 :(得分:1)
通过 api 以 json 格式加载的数据。在下面的代码中查看api url
from bs4 import BeautifulSoup
import requests
import json
url = 'https://footballapi.pulselive.com/football/stats/ranked/teams/wins?page=0&pageSize=20&compSeasons=79&comps=1&altIds=true'
headers = {
"Host": "footballapi.pulselive.com",
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Origin": "https://www.premierleague.com",
"Connection": "keep-alive",
"Referer": "https://www.premierleague.com/stats/top/clubs/wins?se=79",
"If-None-Match": "083bcdbc679be42363d2eaefe7e90df5b",
"TE": "Trailers",
}
results = requests.get(url, headers=headers).json()
for data in results['stats']['content']:
print(data['owner']['name'], data['value'])
答案 1 :(得分:0)
改进您的问题将有助于让所有人更容易理解和提供帮助。
发生了什么?
你可以做的是使用 selenium - 简单示例
from bs4 import BeautifulSoup
from selenium import webdriver
from time import sleep
url = 'https://www.premierleague.com/stats/top/clubs/wins?se=79T'
browser = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
browser.get(url)
sleep(5)
soup=BeautifulSoup(browser.page_source,"html.parser")
PLtable = soup.find('tbody', class_='statsTableContainer')
data = []
for td in PLtable.find_all("td"):
data.append(td.text.replace('\n', ' ').strip())
print(data)
browser.close()
输出
['1.', 'Leicester City', '9', '', '2.', 'Liverpool', '9', '', '3.', 'Everton', '8', '', '4.', 'Manchester United', '8', '', '5.', 'Aston Villa', '7', '', '6.', 'Southampton', '7', '', '7.', 'Tottenham Hotspur', '7', '', '8.', 'Chelsea', '6', '', '9.', 'Manchester City', '6', '', '10.', 'West Ham United', '6', '', '11.', 'Wolverhampton Wanderers', '6', '', '12.', 'Crystal Palace', '5', '', '13.', 'Leeds United', '5', '', '14.', 'Newcastle United', '5', '', '15.', 'Arsenal', '4', '', '16.', 'Brighton and Hove Albion', '2', '', '17.', 'Burnley', '2', '', '18.', 'Fulham', '2', '', '19.', 'West Bromwich Albion', '1', '']