如何使用Selenium抓取innerHTML以获得NFL投注数据?

时间:2019-12-16 21:56:19

标签: python selenium-webdriver web-scraping

我是编码和学习Selenium的新手。我的希望是抓取NFL QB数据以了解Passing Yards。

我正在寻找以下数据:

1)QB的全名(例如,Drew Brees)

2)行(例如308.5)

3)最近5场比赛的数据(例如349、184、311、228、287)

您可以在下面看到QB的名称和专线。

enter image description here

不过,我正在寻找全名。我正在寻找Drew Brees,而不是D. Brees。为了找到最近5场比赛的全名和数据,我需要单击名称D. Brees以访问弹出屏幕。

enter image description here

以下是我正在寻找的输出示例:

Player                Line              Last 5 Games 

Drew Brees             308.5              349, 184, 311, 228, 287
Jacoby Brissett        230.5              251, 319, 129, 148, 59

到目前为止,这是我的代码:

import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time

driver=webdriver.Chrome("C:\webdrivers\chromedriver.exe")
driver.maximize_window()
driver.get("https://www.betonline.ag/sportsbook/player-props")
WebDriverWait(driver,20).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"builder")))
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//li[@class='one-third one-third-remove']//a[./b[contains(.,'Over / Under')]]"))).click()
time.sleep(2)

WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"div[ng-if='selected.league']"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//li[@ng-repeat='league in leagues']/a[.//span[text()='NFL']]"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"div[ng-if^='selected.game']"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//li/a[.//div[text()='All Available']]"))).click()

WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//span[contains(.,'Passing Yards')]"))).click()

任何帮助将不胜感激。在此先感谢您提供的任何输入。

1 个答案:

答案 0 :(得分:1)

因此您可以从API中获取数据。关键是获取玩家编号,以便您可以将其作为参数传递以获取最近5个游戏的统计信息。然后,只需展平嵌套的json响应即可构建表。实际上,它不仅会拉过这里的传球码,还会硬编码以构造您要求的表。 (您可以通过一点点的操作来获得所需的代码):

import requests
import pandas as pd
import re


def flatten_json(y):
    out = {}
    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x
    flatten(y)
    return out




url = 'https://betbuilder.digitalsportstech.com/api/feed'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
payload = {
'betType': 'in,18,19',
'isActive': '1',
'limit': '9999',
'sb': 'betonline',
'tz': '0'}

jsonData = requests.get(url, headers=headers, params=payload).json()   

# Get the player IDs
bets = []
for each in jsonData['data']:
    if 'Passing Yards' in each['statistic']['title']:
        bets.append(each)

playerDict = {}
for each in bets:
    if each['player1']['id'] not in playerDict.keys():
        playerDict[each['player1']['id']] = {'name':each['player1']['name'],
                                              'line':each['markets'][0]['value']}

# Get last 5 game stats
url = 'https://betbuilder.digitalsportstech.com/api/player-stats' 
for playerId in playerDict:
    payload = {
                'playerId': playerId,
                'sb': 'betonline',
                'tz': '0'}

    jsonData = requests.get(url, headers=headers, params=payload).json()
    stats =  jsonData['data'][0]
    flat = flatten_json(stats)

    results = pd.DataFrame()
    special_cols = []

    columns_list = list(flat.keys())
    for item in columns_list:
        try:
            row_idx = re.findall(r'\_(\d+)\_', item )[0]
        except:
            special_cols.append(item)
            continue
        column = re.findall(r'\_\d+\_(.*)', item )[0]
        column = column.replace('_', '')

        row_idx = int(row_idx)
        value = flat[item]

        results.loc[row_idx, column] = value

    for item in special_cols:
        results[item] = flat[item]

    playerDict[playerId]['Last 5 Games'] = list(results['statisticspassing-yards'])


# Create the table   
df = pd.DataFrame()
for k, v in playerDict.items():
    df = df.append(pd.DataFrame([v]), sort=True).reset_index(drop=True)

输出:

print (df.to_string())
                          Last 5 Games   line           name
0  [207.0, 242.0, 276.0, 319.0, 220.0]  250.5   Kirk Cousins
1  [203.0, 195.0, 243.0, 104.0, 233.0]  244.5  Aaron Rodgers