BeautifulSoup和urlopen没有获取正确的表

时间:2019-08-06 20:18:07

标签: python-3.x beautifulsoup urlopen

我正在尝试使用Basketball-Reference数据集练习BeautifulSoup和urlopen。当我尝试获取单个玩家的统计信息时,一切正常,但是后来我尝试对团队的统计信息使用相同的代码,显然urlopen找不到正确的表。

以下代码是从页面获取“标题”。


def fetch_years():

  #Determine the urls
  url = "https://www.basketball-reference.com/leagues/NBA_2000.html?sr&utm_source=direct&utm_medium=Share&utm_campaign=ShareTool#team-stats-per_game::none"

  html = urlopen(url)

  soup = BeautifulSoup(html)

  soup.find_all('tr')

  headers = [th.get_text() for th in soup.find_all('tr')[0].find_all('th')]
  headers = headers[1:]
  print(headers)

我正在尝试按以下格式获取每个游戏数据的球队统计数据:

['Tm', 'G', 'MP', 'FG', ...]

相反,我得到的标题数据是:

['W', 'L', 'W/L%', ...] 

这是1999-2000 season信息中有关球队的第一张表格(名称为“ Division Standings”)。

如果您对球员数据使用相同的代码,例如this one,则会得到我正在寻找的结果:

  Age   Tm   Lg Pos   G  GS    MP   FG  ...  DRB  TRB  AST  STL  BLK  TOV   PF   PTS
0  20  OKC  NBA  PG  82  65  32.5  5.3  ...  2.7  4.9  5.3  1.3  0.2  3.3  2.3  15.3
1  21  OKC  NBA  PG  82  82  34.3  5.9  ...  3.1  4.9  8.0  1.3  0.4  3.3  2.5  16.1
2  22  OKC  NBA  PG  82  82  34.7  7.5  ...  3.1  4.6  8.2  1.9  0.4  3.9  2.5  21.9
3  23  OKC  NBA  PG  66  66  35.3  8.8  ...  3.1  4.6  5.5  1.7  0.3  3.6  2.2  23.6
4  24  OKC  NBA  PG  82  82  34.9  8.2  ...  3.9  5.2  7.4  1.8  0.3  3.3  2.3  23.2

进行网络抓取的代码最初来自here

1 个答案:

答案 0 :(得分:2)

体育-reference.com网站比您的标准网站复杂。这些表是在加载页面后呈现的(页面上的一些表除外),因此您需要先使用Selenium使其呈现,然后再提取html源代码。

但是,另一个选择是,如果您查看html源代码,则会看到这些表位于注释中。您可以使用BeautifulSoup提取注释标签,然后在其中搜索表格标签。

这将返回数据帧列表,并且“每场比赛”团队的统计信息位于索引位置1的表中:

import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd

def fetch_years():

    #Determine the urls
    url = "https://www.basketball-reference.com/leagues/NBA_2000.html?sr&utm_source=direct&utm_medium=Share&utm_campaign=ShareTool#team-stats-per_game::none"
    html = requests.get(url)

    soup = BeautifulSoup(html.text)
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))

    tables = []
    for each in comments:
        if 'table' in each:
            try:
                tables.append(pd.read_html(each)[0])
            except:
                continue
    return tables

tables = fetch_years()

输出:

print (tables[1].to_string())
      Rk                     Team   G     MP    FG   FGA    FG%   3P   3PA    3P%    2P   2PA    2P%    FT   FTA    FT%   ORB   DRB   TRB   AST  STL  BLK   TOV    PF    PTS
0    1.0        Sacramento Kings*  82  241.5  40.0  88.9  0.450  6.5  20.2  0.322  33.4  68.7  0.487  18.5  24.6  0.754  12.9  32.1  45.0  23.8  9.6  4.6  16.2  21.1  105.0
1    2.0         Detroit Pistons*  82  241.8  37.1  80.9  0.459  5.4  14.9  0.359  31.8  66.0  0.481  23.9  30.6  0.781  11.2  30.0  41.2  20.8  8.1  3.3  15.7  24.5  103.5
2    3.0         Dallas Mavericks  82  240.6  39.0  85.9  0.453  6.3  16.2  0.391  32.6  69.8  0.468  17.2  21.4  0.804  11.4  29.8  41.2  22.1  7.2  5.1  13.7  21.6  101.4
3    4.0          Indiana Pacers*  82  240.6  37.2  81.0  0.459  7.1  18.1  0.392  30.0  62.8  0.478  19.9  24.5  0.811  10.3  31.9  42.1  22.6  6.8  5.1  14.1  21.8  101.3
4    5.0         Milwaukee Bucks*  82  242.1  38.7  83.3  0.465  4.8  13.0  0.369  33.9  70.2  0.483  19.0  24.2  0.786  12.4  28.9  41.3  22.6  8.2  4.6  15.0  24.6  101.2
5    6.0      Los Angeles Lakers*  82  241.5  38.3  83.4  0.459  4.2  12.8  0.329  34.1  70.6  0.482  20.1  28.9  0.696  13.6  33.4  47.0  23.4  7.5  6.5  13.9  22.5  100.8
6    7.0            Orlando Magic  82  240.9  38.6  85.5  0.452  3.6  10.6  0.338  35.1  74.9  0.468  19.2  26.1  0.735  14.0  31.0  44.9  20.8  9.1  5.7  17.6  24.0  100.1
7    8.0          Houston Rockets  82  241.8  36.6  81.3  0.450  7.1  19.8  0.358  29.5  61.5  0.480  19.2  26.2  0.733  12.3  31.5  43.8  21.6  7.5  5.3  17.4  20.3   99.5
8    9.0           Boston Celtics  82  240.6  37.2  83.9  0.444  5.1  15.4  0.331  32.2  68.5  0.469  19.8  26.5  0.745  13.5  29.5  43.0  21.2  9.7  3.5  15.4  27.1   99.3
9   10.0     Seattle SuperSonics*  82  241.2  37.9  84.7  0.447  6.7  19.6  0.339  31.2  65.1  0.480  16.6  23.9  0.695  12.7  30.3  43.0  22.9  8.0  4.2  14.0  21.7   99.1
10  11.0           Denver Nuggets  82  242.1  37.3  84.3  0.442  5.7  17.0  0.336  31.5  67.2  0.469  18.7  25.8  0.724  13.1  31.6  44.7  23.3  6.8  7.5  15.6  23.9   99.0
11  12.0            Phoenix Suns*  82  241.5  37.7  82.6  0.457  5.6  15.2  0.368  32.1  67.4  0.477  17.9  23.6  0.759  12.5  31.2  43.7  25.6  9.1  5.3  16.7  24.1   98.9
12  13.0  Minnesota Timberwolves*  82  242.7  39.3  84.3  0.467  3.0   8.7  0.346  36.3  75.5  0.481  16.8  21.6  0.780  12.4  30.1  42.5  26.9  7.6  5.4  13.9  23.3   98.5
13  14.0       Charlotte Hornets*  82  241.2  35.8  79.7  0.449  4.1  12.2  0.339  31.7  67.5  0.469  22.7  30.0  0.758  10.8  32.1  42.9  24.7  8.9  5.9  14.7  20.4   98.4
14  15.0          New Jersey Nets  82  241.8  36.3  83.9  0.433  5.8  16.8  0.347  30.5  67.2  0.454  19.5  24.9  0.784  12.7  28.2  40.9  20.6  8.8  4.8  13.6  23.3   98.0
15  16.0  Portland Trail Blazers*  82  241.2  36.8  78.4  0.470  5.0  13.8  0.361  31.9  64.7  0.493  18.8  24.7  0.760  11.8  31.2  43.0  23.5  7.7  4.8  15.2  22.7   97.5
16  17.0         Toronto Raptors*  82  240.9  36.3  83.9  0.433  5.2  14.3  0.363  31.2  69.6  0.447  19.3  25.2  0.765  13.4  29.9  43.3  23.7  8.1  6.6  13.9  24.3   97.2
17  18.0      Cleveland Cavaliers  82  242.1  36.3  82.1  0.442  4.2  11.2  0.373  32.1  70.9  0.453  20.2  26.9  0.750  12.3  30.5  42.8  23.7  8.7  4.4  17.4  27.1   97.0
18  19.0       Washington Wizards  82  241.5  36.7  81.5  0.451  4.1  10.9  0.376  32.6  70.6  0.462  19.1  25.7  0.743  13.0  29.7  42.7  21.6  7.2  4.7  16.1  26.2   96.6
19  20.0               Utah Jazz*  82  240.9  36.1  77.8  0.464  4.0  10.4  0.385  32.1  67.4  0.476  20.3  26.2  0.773  11.4  29.6  41.0  24.9  7.7  5.4  14.9  24.5   96.5
20  21.0       San Antonio Spurs*  82  242.1  36.0  78.0  0.462  4.0  10.8  0.374  32.0  67.2  0.476  20.1  27.0  0.746  11.3  32.5  43.8  22.2  7.5  6.7  15.0  20.9   96.2
21  22.0    Golden State Warriors  82  240.9  36.5  87.1  0.420  4.2  13.0  0.323  32.3  74.0  0.437  18.3  26.2  0.697  15.9  29.7  45.6  22.6  8.9  4.3  15.9  24.9   95.5
22  23.0      Philadelphia 76ers*  82  241.8  36.5  82.6  0.442  2.5   7.8  0.323  34.0  74.8  0.454  19.2  27.1  0.708  14.0  30.1  44.1  22.2  9.6  4.7  15.7  23.6   94.8
23  24.0              Miami Heat*  82  241.8  36.3  78.8  0.460  5.4  14.7  0.371  30.8  64.1  0.481  16.4  22.3  0.736  11.2  31.9  43.2  23.5  7.1  6.4  15.0  23.7   94.4
24  25.0            Atlanta Hawks  82  241.8  36.6  83.0  0.441  3.1   9.9  0.317  33.4  73.1  0.458  18.0  24.2  0.743  14.0  31.3  45.3  18.9  6.1  5.6  15.4  21.0   94.3
25  26.0      Vancouver Grizzlies  82  242.1  35.3  78.5  0.449  4.0  11.0  0.361  31.3  67.6  0.463  19.4  25.1  0.774  12.3  28.3  40.6  20.7  7.4  4.2  16.8  22.9   93.9
26  27.0         New York Knicks*  82  241.8  35.3  77.7  0.455  4.3  11.4  0.375  31.0  66.3  0.468  17.2  22.0  0.781   9.8  30.7  40.5  19.4  6.3  4.3  14.6  24.2   92.1
27  28.0     Los Angeles Clippers  82  240.3  35.1  82.4  0.426  5.2  15.5  0.339  29.9  67.0  0.446  16.6  22.3  0.746  11.6  29.0  40.6  18.0  7.0  6.0  16.2  22.2   92.0
28  29.0            Chicago Bulls  82  241.5  31.3  75.4  0.415  4.1  12.6  0.329  27.1  62.8  0.432  18.1  25.5  0.709  12.6  28.3  40.9  20.1  7.9  4.7  19.0  23.3   84.8
29   NaN           League Average  82  241.5  36.8  82.1  0.449  4.8  13.7  0.353  32.0  68.4  0.468  19.0  25.3  0.750  12.4  30.5  42.9  22.3  7.9  5.2  15.5  23.3   97.5