其他统计数据的网页抓取 NBA 参考

时间:2021-02-03 11:32:37

标签: python html web-scraping beautifulsoup

我是网络抓取的新手,并尝试使用 Beautifulsoup 从 https://www.basketball-reference.com/leagues/NBA_2021.html 检索杂项表。我编写了一些代码,但我无法打印所需的表,只是不返回任何内容。

from bs4 import BeautifulSoup
from urllib.request import urlopen
import pandas as pd 

url = "http://www.basketball-reference.com/leagues/NBA_2021.html"
data = urlopen(url)
soup = BeautifulSoup(data)

table = soup.find('table', id='misc_stats')
print(table)

任何帮助将不胜感激。谢谢

1 个答案:

答案 0 :(得分:0)

sports-reference.com 站点在源 html 的注释中包含其中一些表格。所以你需要拉出评论,然后解析那里的表格:

import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'}

url = "http://www.basketball-reference.com/leagues/NBA_2021.html"
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))

tables = []
for each in comments:
    if 'table' in str(each):
        try:
            tables.append(pd.read_html(each, attrs = {'id': 'misc_stats'}, header=1)[0])
        except:
            continue

df = tables[0]

输出:

print(df.to_string())
      Rk                    Team   Age     W     L  PW  PL   MOV   SOS   SRS   ORtg   DRtg  NRtg   Pace    FTr   3PAr    TS%   eFG%  TOV%  ORB%  FT/FGA  eFG%.1  TOV%.1  DRB%  FT/FGA.1                       Arena  Attend.  Attend./G
0    1.0      Los Angeles Lakers  29.0  16.0   6.0  16   6  7.73 -0.08  7.65  112.6  104.8   7.8   99.3  0.257  0.352  0.578  0.547  13.1  23.3   0.193   0.513    12.9  80.5     0.152              STAPLES Center        0          0
1    2.0               Utah Jazz  28.5  16.0   5.0  15   6  7.57 -0.20  7.38  115.9  108.2   7.7   98.2  0.245  0.485  0.587  0.557  13.1  26.2   0.185   0.507    10.4  79.7     0.152     Vivint Smart Home Arena    21290       1935
2    3.0         Milwaukee Bucks  28.0  12.0   8.0  15   5  8.15 -0.79  7.36  118.4  110.4   8.0  101.8  0.225  0.425  0.598  0.576  12.5  24.7   0.164   0.532    11.7  79.7     0.161                Fiserv Forum        0          0
3    4.0    Los Angeles Clippers  29.1  16.0   6.0  16   6  7.23 -1.13  6.10  117.8  110.4   7.4   97.4  0.235  0.413  0.603  0.565  12.1  22.2   0.200   0.537    13.3  80.0     0.189              STAPLES Center        0          0
4    5.0          Denver Nuggets  26.5  12.0   8.0  13   7  4.95 -0.19  4.76  117.0  112.0   5.0   97.1  0.244  0.383  0.587  0.556  12.5  26.1   0.187   0.551    13.8  78.7     0.204                  Ball Arena        0          0
5    6.0           Brooklyn Nets  28.1  14.0   9.0  14   9  4.48 -0.56  3.92  117.9  113.5   4.4  101.9  0.264  0.415  0.620  0.584  13.4  20.5   0.217   0.524    10.6  77.6     0.192             Barclays Center        0          0
6    7.0            Phoenix Suns  26.6  11.0   8.0  11   8  2.84  0.24  3.08  110.8  108.0   2.8   97.5  0.214  0.428  0.572  0.537  12.3  18.9   0.179   0.521    12.4  80.0     0.193          Phoenix Suns Arena        0          0
7    8.0      Philadelphia 76ers  26.7  15.0   6.0  13   8  4.19 -1.13  3.06  111.5  107.4   4.1  101.6  0.299  0.351  0.576  0.538  13.8  23.7   0.228   0.515    13.4  77.9     0.199          Wells Fargo Center        0          0
8    9.0           Atlanta Hawks  24.3  10.0  10.0  12   8  2.50  0.26  2.76  112.2  109.7   2.5   99.2  0.298  0.396  0.564  0.517  12.9  25.1   0.243   0.506    11.5  77.1     0.203            State Farm Arena     3529        353
9   10.0          Boston Celtics  25.5  11.0   8.0  11   8  2.53 -0.03  2.50  112.4  109.8   2.6   99.3  0.236  0.359  0.570  0.541  13.4  25.3   0.178   0.536    13.8  77.9     0.209                   TD Garden        0          0
10  11.0       Memphis Grizzlies  24.8   9.0   7.0   9   7  1.31  1.15  2.47  108.9  107.6   1.3  100.6  0.192  0.327  0.551  0.523  12.4  23.0   0.149   0.530    14.6  77.5     0.190                 FedEx Forum      410         51
11  12.0          Indiana Pacers  26.8  12.0   9.0  12   9  2.71 -0.33  2.38  113.0  110.3   2.7   99.9  0.238  0.381  0.583  0.553  12.6  20.4   0.182   0.533    13.3  76.9     0.194     Bankers Life Fieldhouse        0          0
12  13.0         Houston Rockets  28.4  10.0   9.0  11   8  2.95 -0.97  1.98  109.4  106.5   2.9  102.1  0.255  0.445  0.573  0.541  13.7  19.3   0.193   0.512    13.5  76.8     0.195               Toyota Center    28141       3127
13  14.0         Toronto Raptors  27.2   9.0  12.0  12   9  1.67 -1.33  0.34  111.6  109.9   1.7  100.2  0.238  0.479  0.570  0.532  12.9  22.0   0.195   0.533    14.9  77.4     0.234                Amalie Arena    10989        999
14  15.0        Dallas Mavericks  26.4   8.0  13.0   9  12 -2.00  2.00  0.00  109.6  111.6  -2.0   98.7  0.264  0.411  0.559  0.525  11.3  18.5   0.199   0.530    12.7  76.7     0.216    American Airlines Center        0          0
15  16.0       San Antonio Spurs  26.9  11.0  10.0  10  11 -1.05  0.92 -0.13  110.3  111.3  -1.0  100.3  0.224  0.331  0.550  0.516  10.0  19.9   0.175   0.547    12.5  78.8     0.156                 AT&T Center        0          0
16  17.0   Golden State Warriors  26.7  11.0  10.0  10  11 -1.05  0.77 -0.28  108.6  109.6  -1.0  103.2  0.262  0.417  0.563  0.527  12.7  18.4   0.201   0.514    13.5  75.8     0.249                Chase Center        0          0
17  18.0       Charlotte Hornets  24.8  10.0  11.0  10  11 -0.62 -0.41 -1.03  110.2  110.8  -0.6   99.0  0.247  0.414  0.560  0.529  13.1  23.3   0.185   0.544    14.1  75.0     0.166             Spectrum Center        0          0
18  19.0         New York Knicks  24.4   9.0  13.0  10  12 -2.00  0.53 -1.47  107.1  109.2  -2.1   95.4  0.264  0.319  0.538  0.500  12.6  23.8   0.203   0.503    10.7  76.9     0.198  Madison Square Garden (IV)        0          0
19  20.0  Portland Trail Blazers  27.3  11.0   9.0   9  11 -1.65 -0.07 -1.72  115.0  116.6  -1.6   99.8  0.229  0.460  0.567  0.529  10.1  21.5   0.190   0.560    12.2  78.0     0.209                 Moda Center        0          0
20  21.0           Chicago Bulls  24.9   8.0  11.0   8  11 -2.26  0.36 -1.90  110.9  113.1  -2.2  103.4  0.246  0.413  0.590  0.556  15.2  20.8   0.196   0.553    12.9  80.0     0.217               United Center        0          0
21  22.0    New Orleans Pelicans  25.1   7.0  12.0   8  11 -2.58 -0.17 -2.75  110.3  112.8  -2.5   99.6  0.284  0.365  0.558  0.526  13.4  25.6   0.203   0.549    12.8  79.9     0.193        Smoothie King Center     8820        980
22  23.0         Detroit Pistons  26.3   5.0  16.0   7  14 -4.67  1.82 -2.85  107.7  112.4  -4.7   98.5  0.273  0.408  0.544  0.501  13.0  22.6   0.215   0.558    14.2  76.6     0.194        Little Caesars Arena        0          0
23  24.0     Cleveland Cavaliers  24.7  10.0  11.0   8  13 -4.19  0.04 -4.15  104.9  109.1  -4.2   97.2  0.254  0.309  0.536  0.505  14.4  25.9   0.181   0.537    14.9  75.3     0.170         Quicken Loans Arena    12564       1142
24  25.0              Miami Heat  26.7   7.0  13.0   7  13 -5.45  0.29 -5.16  106.9  112.3  -5.4   98.9  0.263  0.452  0.581  0.547  15.7  17.0   0.204   0.543    13.3  76.6     0.183      AmericanAirlines Arena        0          0
25  26.0        Sacramento Kings  25.7   9.0  11.0   7  13 -5.80  0.45 -5.35  112.7  118.4  -5.7  100.1  0.283  0.377  0.576  0.546  13.0  23.5   0.203   0.558    11.6  75.8     0.194             Golden 1 Center        0          0
26  27.0      Washington Wizards  26.2   4.0  13.0   6  11 -5.29 -0.85 -6.14  112.1  117.2  -5.1  104.4  0.282  0.374  0.569  0.534  11.6  20.7   0.212   0.565    12.8  78.9     0.251           Capital One Arena        0          0
27  28.0   Oklahoma City Thunder  23.7   8.0  11.0   5  14 -8.26  0.61 -7.66  105.2  113.3  -8.1  101.3  0.243  0.446  0.556  0.527  12.9  15.7   0.176   0.537    10.9  77.7     0.157     Chesapeake Energy Arena        0          0
28  29.0           Orlando Magic  26.2   8.0  14.0   6  16 -6.82 -1.40 -8.22  105.5  112.3  -6.8   98.9  0.220  0.358  0.526  0.490  12.2  24.1   0.174   0.547    12.4  79.7     0.173                Amway Center    35768       3252
29  30.0  Minnesota Timberwolves  23.5   5.0  15.0   5  15 -9.30  0.55 -8.76  104.6  113.7  -9.1  101.1  0.230  0.377  0.530  0.497  12.7  23.3   0.174   0.539    13.3  75.0     0.217               Target Center        0          0
30   NaN          League Average  26.3   NaN   NaN  10  10  0.00  0.00  0.00  111.1  111.1   NaN   99.8  0.250  0.396  0.568  0.534  12.8  22.2   0.193   0.534    12.8  77.8     0.193                         NaN     4050        400

如果您查看源 html,您会看到注释中的表格以 <!--

开头

BeautifulSoup 跳过了这些。 Hense,您需要在代码中添加专门查找注释comments = soup.find_all(string=lambda text: isinstance(text, Comment))的部分。获得所有评论后,您可以遍历每个评论以查看其中是否有表格。如果有表格,请解析它,就像您通常使用未注释的 <table> 标签一样。

enter image description here