我是编码和学习Selenium的新手。我的希望是抓取NFL QB数据以了解Passing Yards。
我正在寻找以下数据:
1)QB的全名(例如,Drew Brees)
2)行(例如308.5)
3)最近5场比赛的数据(例如349、184、311、228、287)
您可以在下面看到QB的名称和专线。
不过,我正在寻找全名。我正在寻找Drew Brees,而不是D. Brees。为了找到最近5场比赛的全名和数据,我需要单击名称D. Brees以访问弹出屏幕。
以下是我正在寻找的输出示例:
Player Line Last 5 Games
Drew Brees 308.5 349, 184, 311, 228, 287
Jacoby Brissett 230.5 251, 319, 129, 148, 59
到目前为止,这是我的代码:
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time
driver=webdriver.Chrome("C:\webdrivers\chromedriver.exe")
driver.maximize_window()
driver.get("https://www.betonline.ag/sportsbook/player-props")
WebDriverWait(driver,20).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"builder")))
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//li[@class='one-third one-third-remove']//a[./b[contains(.,'Over / Under')]]"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"div[ng-if='selected.league']"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//li[@ng-repeat='league in leagues']/a[.//span[text()='NFL']]"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"div[ng-if^='selected.game']"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//li/a[.//div[text()='All Available']]"))).click()
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//span[contains(.,'Passing Yards')]"))).click()
任何帮助将不胜感激。在此先感谢您提供的任何输入。
答案 0 :(得分:1)
因此您可以从API中获取数据。关键是获取玩家编号,以便您可以将其作为参数传递以获取最近5个游戏的统计信息。然后,只需展平嵌套的json响应即可构建表。实际上,它不仅会拉过这里的传球码,还会硬编码以构造您要求的表。 (您可以通过一点点的操作来获得所需的代码):
import requests
import pandas as pd
import re
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
url = 'https://betbuilder.digitalsportstech.com/api/feed'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
payload = {
'betType': 'in,18,19',
'isActive': '1',
'limit': '9999',
'sb': 'betonline',
'tz': '0'}
jsonData = requests.get(url, headers=headers, params=payload).json()
# Get the player IDs
bets = []
for each in jsonData['data']:
if 'Passing Yards' in each['statistic']['title']:
bets.append(each)
playerDict = {}
for each in bets:
if each['player1']['id'] not in playerDict.keys():
playerDict[each['player1']['id']] = {'name':each['player1']['name'],
'line':each['markets'][0]['value']}
# Get last 5 game stats
url = 'https://betbuilder.digitalsportstech.com/api/player-stats'
for playerId in playerDict:
payload = {
'playerId': playerId,
'sb': 'betonline',
'tz': '0'}
jsonData = requests.get(url, headers=headers, params=payload).json()
stats = jsonData['data'][0]
flat = flatten_json(stats)
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = column.replace('_', '')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
playerDict[playerId]['Last 5 Games'] = list(results['statisticspassing-yards'])
# Create the table
df = pd.DataFrame()
for k, v in playerDict.items():
df = df.append(pd.DataFrame([v]), sort=True).reset_index(drop=True)
输出:
print (df.to_string())
Last 5 Games line name
0 [207.0, 242.0, 276.0, 319.0, 220.0] 250.5 Kirk Cousins
1 [203.0, 195.0, 243.0, 104.0, 233.0] 244.5 Aaron Rodgers