我正在尝试从 ESPN 中提取一些数据,并对提取的数据进行一些计算。理想情况下,我想遍历数据帧,使用 Selenium 获取玩家姓名,将玩家姓名发送到搜索框中并告诉 Selenium 单击玩家姓名。我能够用一名球员成功地做到这一点。我不太确定如何遍历数据框中的所有玩家。
代码的第二部分是我挣扎的地方。由于某种原因,我无法获取数据。 Selenium 无法找到任何元素。我不认为我做得很好。如果我能够抓取所需的数据,我希望将它们插入计算并将计算出的投影点附加到我的数据框 dfNBA 中。
有人可以帮我写代码吗?并指出我正确的方向。我正在努力提高编写 python 代码的效率,但现在我卡住了
谢谢
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#sample data
pp = {'Player Name':['Donovan Mitchell', 'Kawhi Leonard', 'Rudy Gobert', 'Paul George','Reggie Jackson', 'Jordan Clarkson'],
'Fantasy Score': [46.0, 50.0, 40.0, 44.0, 25.0, 26.5]}
#Creating a dataframe from dictionary
dfNBA = pd.DataFrame(pp)
#Scraping ESPN
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.espn.com/")
#Clicking the search button
driver.find_element_by_xpath("//a[@id='global-search-trigger']").click()
#sending data to the search button
driver.find_element_by_xpath("//input[@placeholder='Search Sports, Teams or Players...']").send_keys(dfNBA.iloc[0,:].values[0])
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".search_results__details")))
playerPage = driver.find_element_by_css_selector(".search_results__details").click()
#Scraping data from last 10 games
points = driver.find_element_by_xpath(".//div[@class='Table__TD']")[13]
#rebs = driver.find_element_by_xpath("//*[@id='fittPageContainer'']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[7]")
#asts = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[8]")
#blks = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[9]")
#stls = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[10]")
#tnvrs = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[12]")
#projectedPoints = points+(rebs*1.2)+(asts*1.5)+(blks*3)+(stls*3)-(tnvrs*1)
print(points)
答案 0 :(得分:1)
这里有一些代码来完成(我认为)你想要的。您需要等待表格元素出现,修复您的 xpath,并从表格数组中选择正确的元素。
pp = {'Player Name':['Donovan Mitchell', 'Kawhi Leonard', 'Rudy Gobert', 'Paul George','Reggie Jackson', 'Jordan Clarkson'],
'Fantasy Score': [46.0, 50.0, 40.0, 44.0, 25.0, 26.5]}
#Creating a dataframe from dictionary
dfNBA = pd.DataFrame(pp)
#Scraping ESPN
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.espn.com/")
#Clicking the search button
driver.find_element_by_xpath("//a[@id='global-search-trigger']").click()
#sending data to the search button
driver.find_element_by_xpath("//input[@placeholder='Search Sports, Teams or Players...']").send_keys(dfNBA.iloc[0,:].values[0])
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".search_results__details")))
playerPage = driver.find_element_by_css_selector(".search_results__details").click()
#Scraping data from last 10 games
WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td[@class='Table__TD']")))
points = driver.find_elements_by_xpath("//td[@class='Table__TD']")[12].text
rebs = driver.find_elements_by_xpath("//td[@class='Table__TD']")[6].text
asts = driver.find_elements_by_xpath("//td[@class='Table__TD']")[7].text
blks = driver.find_elements_by_xpath("//td[@class='Table__TD']")[8].text
stls = driver.find_elements_by_xpath("//td[@class='Table__TD']")[9].text
tnvrs = driver.find_elements_by_xpath("//td[@class='Table__TD']")[11].text
projectedPoints = float(points)+(float(rebs)*1.2)+(float(asts)*1.5)+(float(blks)*3)+(float(stls)*3)-(float(tnvrs)*1)
print(projectedPoints)
答案 1 :(得分:1)
如果有可行的 api 选项,我认为 Selenium 有点矫枉过正。
试试这个。请注意,在概述中,L10 比赛是指最近 10 场常规赛。我这里的代码执行了包括季后赛在内的最后 10 场比赛。如果你只想要常规赛,告诉我,我可以调整。我还在此处添加了一个变量,因此如果您想要例如仅最近 5 场比赛或最近 15 场比赛等,您也可以这样做。
import requests
import pandas as pd
previous_games = 10
pp = {'Player Name':['Donovan Mitchell', 'Kawhi Leonard', 'Rudy Gobert', 'Paul George','Reggie Jackson', 'Jordan Clarkson'],
'Fantasy Score': [46.0, 50.0, 40.0, 44.0, 25.0, 26.5]}
#Creating a dataframe from dictionary
dfNBA = pd.DataFrame(pp)
search_api = 'https://site.api.espn.com/apis/search/v2'
for idx, row in dfNBA.iterrows():
playerName = row['Player Name']
payload = {'query': '%s' %playerName}
results = requests.get(search_api, params=payload).json()['results']
for each in results:
if each['type'] == 'player':
playerID = each['contents'][0]['uid'].split('a:')[-1]
break
player_api = 'https://site.web.api.espn.com/apis/common/v3/sports/basketball/nba/athletes/%s/gamelog' %playerID
playload = {'season':'2021' }
jsonData_player = requests.get(player_api, params=payload).json()
#Scraping data from last x games
last_x_gameIDs = list(jsonData_player['events'].keys())
last_x_gameIDs.sort()
last_x_gameIDs = last_x_gameIDs[-1*previous_games:]
gamelog_dict = {}
seasonTypes = jsonData_player['seasonTypes']
for gameID in last_x_gameIDs:
for each in seasonTypes:
categories = each['categories']
for category in categories:
if category['type'] == 'total':
continue
events = category['events']
for event in events:
if gameID == event['eventId']:
gamelog_dict[gameID] = event['stats']
labels = jsonData_player['labels']
# Aggrigate totals
for k, v in gamelog_dict.items():
v = dict(zip(labels, v))
gamelog_dict[k] = v
stats = pd.DataFrame(gamelog_dict.values())
points = stats['PTS'].astype(float).sum() / previous_games
rebs = stats['REB'].astype(float).sum() / previous_games
asts = stats['AST'].astype(float).sum() / previous_games
blks = stats['BLK'].astype(float).sum() / previous_games
stls = stats['STL'].astype(float).sum() / previous_games
tnvrs = stats['TO'].astype(float).sum() /previous_games
projectedPoints = float(points)+(float(rebs)*1.2)+(float(asts)*1.5)+(float(blks)*3)+(float(stls)*3)-(float(tnvrs)*1)
print('%s: %.02f' %(playerName,projectedPoints))
输出:
Donovan Mitchell: 42.72
Kawhi Leonard: 52.25
Rudy Gobert: 38.47
Paul George: 44.18
Reggie Jackson: 24.21
Jordan Clarkson: 25.88