我正在尝试从篮球参考资料中抓取一张桌子,它返回一个空列表。我希望有人可以帮助我调试或解释原因。该页面上有很多表,但是特别是“杂项统计”部分。预先感谢!
from bs4 import BeautifulSoup
import requests
import time
import pandas as pd
import matplotlib as plt
import numpy as np
url = 'https://www.basketball-reference.com/leagues/NBA_2020.html#all_misc_stats'
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
soup.find('div', {'id':'div_misc_stats'})
答案 0 :(得分:0)
您的实现对于解析汤没有错,只是您要查找的特定元素需要JavaScript才能呈现。如果可以找到数据,最好还是寻找其他数据源。
如果您确实需要此数据,则可能希望先呈现页面(有关启发,请参见this)
从我的粗略分析来看,似乎也没有进行外部网络调用以在呈现数据之前获取数据,因此它可能以xml / json / etc的形式嵌入到页面的其他位置,尽管我没有在我的搜索中找到它。也许值得一试,如果这不是一次性的事情,那么您需要投资一种计算成本更高的方法。
答案 1 :(得分:0)
您要抓取的该网站是一个动态网站,因此,您无法在第一次请求该网站时访问所有数据,需要等待几秒钟来呈现javascript,然后才能访问访问所有网站数据,对于此解决方案,您可以使用selenium。阅读文档并下载chrome或firefox驱动程序,然后使用它,我编写了可以访问该表的代码:
from selenium import webdriver
import pandas as pd
import os
import time
chromedriver = "driver/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
url = 'https://www.basketball-reference.com/leagues/NBA_2020.html#all_misc_stats'
driver.get(url)
time.sleep(15)
soruce = driver.page_source
tables = pd.read_html(soruce)
for table in tables:
try:
if 'Arena' in table.columns[25][1]:
print(table)
except:
pass
打印:
Rk Team Age ... Arena Attend. Attend./G
0 1.0 Milwaukee Bucks* 29.2 ... Fiserv Forum 549036 17711
1 2.0 Los Angeles Lakers* 29.6 ... STAPLES Center 588907 18997
2 3.0 Los Angeles Clippers* 27.4 ... STAPLES Center 610176 19068
3 4.0 Toronto Raptors* 26.6 ... Scotiabank Arena 633456 19796
4 5.0 Dallas Mavericks 26.2 ... American Airlines Center 682096 20062
5 6.0 Boston Celtics* 25.3 ... TD Garden 610864 19090
6 7.0 Houston Rockets* 29.1 ... Toyota Center 578458 18077
7 8.0 Utah Jazz* 27.5 ... Vivint Smart Home Arena 567486 18306
8 9.0 Denver Nuggets* 25.6 ... Pepsi Center 633153 19186
9 10.0 Oklahoma City Thunder* 25.6 ... Chesapeake Energy Arena 600699 18203
10 11.0 Miami Heat* 25.9 ... AmericanAirlines Arena 629771 19680
11 12.0 Philadelphia 76ers* 26.4 ... Wells Fargo Center 639491 20629
12 13.0 Indiana Pacers* 25.6 ... Bankers Life Fieldhouse 529002 16531
13 14.0 New Orleans Pelicans 25.4 ... Smoothie King Center 528172 16505
14 15.0 Orlando Magic 26.0 ... Amway Center 529870 17093
15 16.0 Memphis Grizzlies 24.0 ... FedEx Forum 523297 15857
16 17.0 Phoenix Suns 24.7 ... Talking Stick Resort Arena 550633 15606
17 18.0 Portland Trail Blazers 27.5 ... Moda Center 628303 19634
18 19.0 Brooklyn Nets 26.5 ... Barclays Center 524907 16403
19 20.0 San Antonio Spurs 27.9 ... AT&T Center 550515 18351
20 21.0 Sacramento Kings 27.1 ... Golden 1 Center 520663 16796
21 22.0 Minnesota Timberwolves 24.8 ... Target Center 482112 15066
22 23.0 Chicago Bulls 24.4 ... United Center 639352 18804
23 24.0 Detroit Pistons 25.9 ... Little Caesars Arena 509469 15294
24 25.0 Washington Wizards 25.4 ... Capital One Arena 532702 16647
25 26.0 New York Knicks 24.5 ... Madison Square Garden (IV) 620789 18812
26 27.0 Charlotte Hornets 24.3 ... Spectrum Center 478591 15428
27 28.0 Cleveland Cavaliers 25.0 ... Quicken Loans Arena 643008 17861
28 29.0 Atlanta Hawks 24.1 ... State Farm Arena 545453 16043
29 30.0 Golden State Warriors 24.4 ... Chase Center 614176 18064
30 NaN League Average 26.2 ... NaN 575820 17788
[31 rows x 28 columns]
答案 2 :(得分:0)
数据在HTML注释<!-- ... -->
中。您可以使用此脚本将其加载到DataFrame中:
import requests
import pandas as pd
from bs4 import BeautifulSoup, Comment
url = 'https://www.basketball-reference.com/leagues/NBA_2020.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = soup.select_one('h2:contains("Miscellaneous Stats")').find_next(text=lambda t: isinstance(t, Comment))
df = pd.read_html(str(table))[0].droplevel(0, axis=1)
print(df)
打印:
Rk Team Age W L PW PL MOV SOS SRS ORtg DRtg ... TS% eFG% TOV% ORB% FT/FGA eFG% TOV% DRB% FT/FGA Arena Attend. Attend./G
0 1.0 Milwaukee Bucks* 29.2 53.0 12.0 52 13 11.29 -0.85 10.44 112.6 101.9 ... 0.583 0.553 12.8 20.7 0.196 0.486 12.2 81.7 0.172 Fiserv Forum 549036 17711
1 2.0 Los Angeles Lakers* 29.6 49.0 14.0 45 18 7.41 0.34 7.75 113.0 105.6 ... 0.577 0.548 13.2 24.6 0.196 0.509 13.8 78.4 0.202 STAPLES Center 588907 18997
2 3.0 Los Angeles Clippers* 27.4 44.0 20.0 44 20 6.52 0.22 6.74 113.6 107.2 ... 0.574 0.532 12.7 24.0 0.232 0.503 12.3 77.3 0.210 STAPLES Center 610176 19068
3 4.0 Toronto Raptors* 26.6 46.0 18.0 44 20 6.45 -0.57 5.88 111.6 105.2 ... 0.574 0.536 12.8 21.6 0.205 0.502 14.6 76.1 0.200 Scotiabank Arena 633456 19796
4 5.0 Dallas Mavericks 26.2 40.0 27.0 45 22 6.04 -0.21 5.84 116.7 110.6 ... 0.581 0.548 11.3 23.5 0.198 0.519 10.9 77.4 0.172 American Airlines Center 682096 20062
5 6.0 Boston Celtics* 25.3 43.0 21.0 44 20 6.17 -0.48 5.69 112.9 106.8 ... 0.567 0.529 12.0 23.9 0.204 0.510 13.6 77.5 0.212 TD Garden 610864 19090
6 7.0 Houston Rockets* 29.1 40.0 24.0 39 25 3.75 0.03 3.78 113.8 110.2 ... 0.578 0.539 12.6 22.4 0.226 0.528 13.5 75.6 0.194 Toyota Center 578458 18077
7 8.0 Utah Jazz* 27.5 41.0 23.0 38 26 3.17 0.03 3.20 112.6 109.4 ... 0.587 0.552 13.6 21.2 0.208 0.514 10.9 79.0 0.180 Vivint Smart Home Arena 567486 18306
8 9.0 Denver Nuggets* 25.6 43.0 22.0 39 26 2.95 0.06 3.02 112.5 109.5 ... 0.564 0.532 12.3 24.7 0.178 0.526 13.0 77.0 0.194 Pepsi Center 633153 19186
9 10.0 Oklahoma City Thunder* 25.6 40.0 24.0 37 27 2.45 0.34 2.79 111.6 109.1 ... 0.577 0.534 12.3 19.2 0.233 0.520 12.4 76.8 0.164 Chesapeake Energy Arena 600699 18203
10 11.0 Miami Heat* 25.9 41.0 24.0 39 26 3.23 -0.65 2.58 112.7 109.4 ... 0.587 0.549 13.5 20.5 0.231 0.522 12.3 79.7 0.208 AmericanAirlines Arena 629771 19680
11 12.0 Philadelphia 76ers* 26.4 39.0 26.0 37 28 2.22 0.01 2.22 110.4 108.2 ... 0.562 0.530 12.7 23.7 0.189 0.522 12.7 80.4 0.211 Wells Fargo Center 639491 20629
12 13.0 Indiana Pacers* 25.6 39.0 26.0 37 28 1.94 -0.33 1.61 110.3 108.3 ... 0.565 0.533 11.9 20.3 0.170 0.513 12.8 77.1 0.193 Bankers Life Fieldhouse 529002 16531
13 14.0 New Orleans Pelicans 25.4 28.0 36.0 30 34 -0.83 1.13 0.30 110.8 111.6 ... 0.567 0.538 13.7 24.3 0.183 0.531 12.3 78.1 0.207 Smoothie King Center 528172 16505
14 15.0 Orlando Magic 26.0 30.0 35.0 30 35 -0.97 0.12 -0.85 108.0 109.0 ... 0.540 0.503 11.4 22.4 0.191 0.535 13.5 79.0 0.170 Amway Center 529870 17093
15 16.0 Memphis Grizzlies 24.0 32.0 33.0 30 35 -1.08 0.02 -1.05 109.4 110.4 ... 0.561 0.530 13.2 23.2 0.178 0.520 12.6 77.6 0.213 FedEx Forum 523297 15857
16 17.0 Phoenix Suns 24.7 26.0 39.0 30 35 -1.37 0.32 -1.05 110.5 111.8 ... 0.572 0.528 13.3 22.2 0.226 0.543 14.0 78.3 0.221 Talking Stick Resort Arena 550633 15606
17 18.0 Portland Trail Blazers 27.5 29.0 37.0 30 36 -1.61 0.49 -1.11 112.5 114.1 ... 0.566 0.530 11.5 22.0 0.191 0.523 11.0 75.0 0.204 Moda Center 628303 19634
18 19.0 Brooklyn Nets 26.5 30.0 34.0 31 33 -0.64 -0.54 -1.18 108.1 108.7 ... 0.550 0.515 13.4 23.5 0.199 0.507 10.9 77.8 0.181 Barclays Center 524907 16403
19 20.0 San Antonio Spurs 27.9 27.0 36.0 28 35 -1.76 0.57 -1.21 111.9 113.7 ... 0.569 0.529 11.0 19.5 0.206 0.542 11.5 79.2 0.194 AT&T Center 550515 18351
20 21.0 Sacramento Kings 27.1 28.0 36.0 28 36 -1.92 0.48 -1.44 109.7 111.6 ... 0.563 0.531 13.0 21.8 0.178 0.540 13.6 78.5 0.222 Golden 1 Center 520663 16796
21 22.0 Minnesota Timberwolves 24.8 19.0 45.0 24 40 -4.30 0.51 -3.78 108.1 112.2 ... 0.551 0.514 13.0 22.1 0.209 0.541 13.2 77.2 0.218 Target Center 482112 15066
22 23.0 Chicago Bulls 24.4 22.0 43.0 26 39 -3.08 -0.73 -3.81 106.7 109.8 ... 0.547 0.515 13.7 22.8 0.175 0.546 16.3 75.6 0.239 United Center 639352 18804
23 24.0 Detroit Pistons 25.9 20.0 46.0 26 40 -3.56 -0.66 -4.22 109.0 112.7 ... 0.561 0.529 13.8 22.6 0.194 0.541 12.7 75.9 0.186 Little Caesars Arena 509469 15294
24 25.0 Washington Wizards 25.4 24.0 40.0 24 40 -4.05 -0.81 -4.86 111.9 115.8 ... 0.568 0.528 12.1 22.0 0.214 0.560 14.0 74.9 0.230 Capital One Arena 532702 16647
25 26.0 New York Knicks 24.5 21.0 45.0 20 46 -6.45 -0.09 -6.55 106.5 113.0 ... 0.531 0.501 12.6 25.8 0.182 0.541 12.4 78.3 0.224 Madison Square Garden (IV) 620789 18812
26 27.0 Charlotte Hornets 24.3 23.0 42.0 19 46 -6.75 -0.12 -6.88 106.3 113.3 ... 0.539 0.504 13.3 23.9 0.188 0.546 13.1 74.4 0.159 Spectrum Center 478591 15428
27 28.0 Cleveland Cavaliers 25.0 19.0 46.0 18 47 -7.89 0.33 -7.55 107.5 115.4 ... 0.553 0.522 14.6 24.6 0.172 0.560 11.7 77.4 0.164 Quicken Loans Arena 643008 17861
28 29.0 Atlanta Hawks 24.1 20.0 47.0 18 49 -7.97 0.40 -7.57 107.2 114.8 ... 0.554 0.515 13.8 21.6 0.204 0.543 12.7 74.9 0.233 State Farm Arena 545453 16043
29 30.0 Golden State Warriors 24.4 15.0 50.0 16 49 -8.71 0.79 -7.92 105.2 113.8 ... 0.540 0.497 13.2 21.5 0.212 0.553 13.7 76.4 0.193 Chase Center 614176 18064
30 NaN League Average 26.2 NaN NaN 32 32 0.00 0.00 0.00 110.4 110.4 ... 0.564 0.528 12.8 22.6 0.199 0.528 12.8 77.4 0.199 NaN 575820 17788
[31 rows x 28 columns]