使用Python BeautifulSoup搜刮NBA高级统计信息

时间:2019-03-05 02:44:00

标签: python web-scraping beautifulsoup

我想了解NBA的高级数据。首先,我只希望能够刮擦团队的名称,但是我遇到一个问题,即它不收集任何信息。我可能在find_all函数中寻找错误的东西。任何帮助表示赞赏!

import requests
from bs4 import BeautifulSoup

url = "https://stats.nba.com/teams/elbow-touch/?sort=ELBOW_TOUCHES&dir=-1"
result = requests.get(url)
c = result.content

soup = Beaut ifulSoup(c,"html.parser")

title = soup.title.text
print(title)

teams = soup.find_all('td',{'class':'team'})

for element in teams:
    print(element.text)

我要抓取的网站:

Site that I want to scrape

3 个答案:

答案 0 :(得分:2)

另一种方法是将get请求发送到站点API并接收json响应。通过更改参数,可以获得不同的结果。

您可以在Chrome开发者工具下查找浏览器将请求发送到的位置。

import requests

url = "https://stats.nba.com/stats/leaguedashptstats?"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}

params = {
    "PerMode": "PerGame",
    "PlayerOrTeam": "Team",
    "PtMeasureType": "ElbowTouch",
    "Season": "2018-19",
    "SeasonType": "Regular Season",
    "StarterBench": "",
    "PlayerPosition": "",
    "PlayerExperience": "",
    "GameScope": "",
    "VsConference": "",
    "VsDivision": "",
    "DateFrom": "",
    "DateTo": "",
    "SeasonSegment": "",
    "Location": "",
    "Outcome": "",
    "LastNGames": "0",
    "Month": "0",
    "OpponentTeamID": "0"
}

r = requests.get(url, params=params, headers=headers)
data = r.json()
results = data['resultSets'][0]['rowSet']

for result in results:
    print(result)

答案 1 :(得分:1)

网站是动态的,因此,您需要使用selenium

from selenium import webdriver
from bs4 import BeautifulSoup as soup 
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://stats.nba.com/teams/elbow-touch/?sort=ELBOW_TOUCHES&dir=-1')
s = soup(d.page_source, 'html.parser').find('table', {'class':'table'})
headers, [_, *data] = [i.text for i in s.find_all('th')], [[i.text for i in b.find_all('td')] for b in s.find_all('tr')]
final_data = [i for i in data if len(i) > 1]

现在,final_data存储所有团队结果:

[['Houston Rockets', '63', '38', '25', '242.0', '367.0', '8.8', '2.4', '3.8', '64.2', '0.4', '0.7', '62.8', '5.5', '-', '3.7', '-', '0.5', '14.0', '0.5', '5.4', '0.3', '-'], ['Milwaukee Bucks', '63', '48', '15', '241.2', '409.5', '9.5', '2.3', '3.6', '62.4', '0.7', '1.0', '73.3', '5.4', '-', '4.3', '-', '0.6', '13.0', '0.5', '5.2', '0.4', '-'], ['New York Knicks', '62', '13', '49', '241.6', '420.4', '9.5', '2.0', '3.4', '56.8', '0.7', '1.0', '69.8', '4.8', '-', '4.7', '-', '0.6', '13.7', '0.5', '5.3', '0.5', '-'], ['Charlotte Hornets', '63', '29', '34', '242.0', '409.7', '9.6', '1.7', '3.5', '50.0', '1.1', '1.5', '71.9', '4.7', '-', '4.6', '-', '0.7', '14.2', '0.4', '4.5', '0.7', '-'], ['Detroit Pistons', '62', '31', '31', '242.8', '437.0', '10.0', '1.6', '3.2', '51.3', '0.9', '1.2', '75.3', '4.4', '-', '5.0', '-', '0.9', '17.6', '0.7', '6.8', '0.6', '-'], ['Washington Wizards', '62', '25', '37', '243.2', '420.2', '10.5', '2.5', '4.3', '58.4', '0.9', '1.2', '76.4', '6.1', '-', '4.6', '-', '0.7', '15.5', '0.6', '5.6', '0.5', '-'], ['Atlanta Hawks', '64', '22', '42', '242.3', '434.9', '11.0', '2.2', '3.7', '58.6', '1.2', '1.5', '77.3', '5.7', '-', '5.3', '-', '0.7', '12.9', '0.7', '6.5', '0.7', '-'], ['Brooklyn Nets', '65', '32', '33', '243.8', '440.3', '11.2', '2.5', '4.4', '58.3', '1.2', '1.7', '70.8', '6.4', '-', '4.6', '-', '0.7', '14.9', '0.9', '7.9', '0.8', '-'], ['San Antonio Spurs', '64', '35', '29', '241.6', '402.3', '11.3', '2.3', '4.1', '55.5', '0.8', '1.0', '85.7', '5.6', '-', '5.8', '-', '1.1', '18.7', '0.5', '4.8', '0.4', '-'], ['Boston Celtics', '64', '38', '26', '241.6', '420.8', '11.5', '2.5', '4.2', '58.4', '0.5', '0.7', '71.7', '5.5', '-', '5.7', '-', '0.9', '15.0', '0.6', '5.6', '0.3', '-'], ['Toronto Raptors', '64', '46', '18', '242.3', '418.0', '11.5', '3.5', '5.9', '59.6', '1.2', '1.5', '78.1', '8.3', '-', '4.1', '-', '0.7', '16.3', '0.4', '3.7', '0.7', '-'], ['Portland Trail Blazers', '63', '39', '24', '241.6', '409.8', '11.8', '2.4', '4.6', '51.9', '1.2', '1.5', '80.2', '6.1', '-', '5.5', '-', '1.0', '18.8', '0.7', '5.7', '0.7', '-'], ['Utah Jazz', '61', '36', '25', '240.8', '435.9', '11.9', '2.0', '3.8', '51.1', '1.4', '2.2', '66.7', '5.4', '-', '5.9', '-', '1.0', '17.1', '0.7', '5.9', '1.0', '-'], ['Minnesota Timberwolves', '63', '29', '34', '241.6', '412.4', '12.0', '2.9', '5.0', '57.3', '1.3', '1.6', '79.8', '7.3', '-', '5.2', '-', '1.0', '19.5', '0.6', '5.2', '0.7', '-'], ['Chicago Bulls', '63', '18', '45', '243.2', '411.3', '12.4', '2.8', '4.8', '57.9', '0.7', '0.9', '77.6', '6.4', '-', '6.3', '-', '0.8', '12.4', '0.6', '4.5', '0.4', '-'], ['LA Clippers', '65', '36', '29', '241.9', '430.4', '12.4', '2.9', '5.1', '56.9', '1.0', '1.5', '69.5', '7.0', '-', '5.4', '-', '0.9', '15.9', '0.7', '5.5', '0.6', '-'], ['Miami Heat', '62', '28', '34', '240.4', '426.1', '12.6', '2.0', '4.0', '50.2', '0.7', '1.3', '56.8', '4.9', '-', '7.0', '-', '1.1', '15.4', '0.4', '3.4', '0.5', '-'], ['New Orleans Pelicans', '65', '29', '36', '240.0', '435.0', '12.6', '3.5', '6.4', '54.8', '1.2', '1.6', '74.5', '8.4', '-', '4.4', '-', '0.9', '20.4', '0.7', '5.2', '0.8', '-'], ['Phoenix Suns', '64', '13', '51', '242.3', '435.8', '12.9', '2.8', '5.0', '56.7', '1.0', '1.3', '73.5', '6.8', '-', '6.2', '-', '0.8', '13.7', '0.6', '4.7', '0.6', '-'], ['Oklahoma City Thunder', '63', '39', '24', '242.0', '364.8', '13.6', '3.2', '5.8', '54.5', '1.0', '1.4', '65.9', '7.5', '-', '5.8', '-', '0.9', '14.7', '0.7', '4.8', '0.6', '-'], ['Dallas Mavericks', '62', '27', '35', '240.8', '435.4', '13.9', '1.8', '3.1', '55.9', '1.2', '1.6', '76.5', '5.0', '-', '8.6', '-', '1.1', '13.1', '0.8', '5.7', '0.7', '-'], ['Golden State Warriors', '63', '44', '19', '241.6', '442.3', '13.9', '2.8', '4.8', '57.0', '1.2', '1.5', '81.7', '6.9', '-', '7.2', '-', '1.6', '21.7', '0.8', '5.8', '0.7', '-'], ['Orlando Magic', '63', '28', '35', '241.2', '405.0', '14.0', '3.2', '5.7', '55.8', '1.1', '1.4', '80.9', '7.7', '-', '6.5', '-', '1.4', '21.8', '0.6', '4.0', '0.7', '-'], ['Los Angeles Lakers', '63', '30', '33', '241.6', '405.9', '14.2', '3.3', '5.7', '57.8', '1.1', '1.6', '67.0', '7.8', '-', '6.3', '-', '1.3', '20.7', '0.9', '6.3', '0.7', '-'], ['Denver Nuggets', '62', '42', '20', '240.8', '435.2', '15.0', '3.1', '5.3', '59.1', '1.1', '1.5', '72.5', '7.5', '-', '7.4', '-', '1.7', '22.3', '1.0', '6.4', '0.7', '-'], ['Indiana Pacers', '64', '41', '23', '240.4', '431.7', '15.3', '4.4', '7.2', '60.6', '1.4', '1.9', '74.2', '10.4', '-', '5.8', '-', '1.2', '20.9', '0.9', '6.0', '0.9', '-'], ['Cleveland Cavaliers', '64', '16', '48', '241.2', '407.3', '16.1', '2.3', '4.5', '51.6', '0.9', '1.1', '80.0', '5.6', '-', '10.0', '-', '1.2', '12.3', '0.5', '3.4', '0.4', '-'], ['Philadelphia 76ers', '63', '40', '23', '242.0', '446.9', '16.6', '2.5', '4.7', '52.7', '1.4', '1.7', '82.6', '6.6', '-', '9.6', '-', '1.8', '18.6', '0.7', '4.3', '0.7', '-'], ['Sacramento Kings', '62', '31', '31', '240.8', '425.2', '16.7', '3.2', '6.3', '50.3', '1.1', '1.6', '65.3', '7.5', '-', '8.0', '-', '1.5', '18.3', '1.0', '6.2', '0.7', '-'], ['Memphis Grizzlies', '65', '25', '40', '241.9', '452.1', '20.5', '3.4', '6.7', '51.3', '1.5', '1.9', '81.1', '8.6', '-', '11.2', '-', '1.6', '14.1', '0.8', '4.1', '0.8', '-']]

只得到队伍:

teams = [a for a, *_ in final_data]

输出:

['Houston Rockets', 'Milwaukee Bucks', 'New York Knicks', 'Charlotte Hornets', 'Detroit Pistons', 'Washington Wizards', 'Atlanta Hawks', 'Brooklyn Nets', 'San Antonio Spurs', 'Boston Celtics', 'Toronto Raptors', 'Portland Trail Blazers', 'Utah Jazz', 'Minnesota Timberwolves', 'Chicago Bulls', 'LA Clippers', 'Miami Heat', 'New Orleans Pelicans', 'Phoenix Suns', 'Oklahoma City Thunder', 'Dallas Mavericks', 'Golden State Warriors', 'Orlando Magic', 'Los Angeles Lakers', 'Denver Nuggets', 'Indiana Pacers', 'Cleveland Cavaliers', 'Philadelphia 76ers', 'Sacramento Kings', 'Memphis Grizzlies']

要获取特定的统计信息,最简单的方法是通过将标头值绑定到数据列表来创建字典列表:

data_attrs = [dict(zip(headers, i)) for i in final_data]
all_touches = [i['Touches'] for i in data_attrs]

答案 2 :(得分:0)

@ Ajax1234答案的一种变化可以使您将整个表加载到数据框中:

import pandas as pd

pd.read_html(str(s))

有你的桌子。