我正在尝试从以下网站抓取一些NFL数据:
url = https://www.pro-football-reference.com/years/2019/opp.htm.
我首先尝试使用大熊猫从表格中抓取数据。我以前做过,而且一直很简单。我希望熊猫能返回该页面上所有表的列表。可是我跑的时候
dfs = pd.read_html(url)
我只从网页上收到了前两个表,Team Defense和Team Advanced Defense。
然后我去尝试用bs4和请求刮擦其他表。为了测试,我首先只尝试刮擦第一个表:
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table = soup.find('table', id = 'advanced_defense')
rows = table.find_all('tr')
for tr in rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
然后,我能够简单地更改id
,以便我同时返回了Team Defence和Team Advanced Defense-熊猫返回的两个表。
但是,当我尝试使用相同的方法来刮取页面上的其他表时,出现错误。我以与前两个表相同的方式检查网页获得了id
,但无法获得结果。
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table = soup.find('table', id = 'passing')
rows = table.find_all('tr')
for tr in rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
由于我收到以下错误,试图刮除页面上的任何其他表时,找不到table
的任何内容
AttributeError: 'NoneType' object has no attribute 'find_all'
我发现pandas和bs4都只能返回Team Defence和Team Advanced Defense表很奇怪。
我只打算刮擦“团队防守”,“通过防守”和“冲刺防守”表。
我该如何成功地清除“通过防御”和“冲抵防御”表?
答案 0 :(得分:1)
因此体育参考网站的技巧很棘手,因为第一个表(或几个表)确实显示在html源代码中。其他表是动态呈现的。但是,这些其他表在html的注释中。因此,要获取其他表,必须先删除注释,然后可以使用pandas或beautifulsoup来获取这些表标签。
因此,您可以像往常一样获取团队统计信息。然后提取注释并解析其他表。
import pandas as pd
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://www.pro-football-reference.com/years/2019/opp.htm'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
dfs = [pd.read_html(url, header=0, attrs={'id':'team_stats'})[0]]
dfs[0].columns = dfs[0].iloc[0,:]
dfs[0] = dfs[0].iloc[1:,:].reset_index(drop=True)
for each in comments:
if 'table' in each and ('id="passing"' in each or 'id="rushing"' in each):
dfs.append(pd.read_html(each)[0])
输出:
for df in dfs:
print (df)
0 Rk Tm G PF ... 1stPy Sc% TO% EXP
0 1 New England Patriots 16 225 ... 39 19.4 17.3 165.75
1 2 Buffalo Bills 16 259 ... 33 23.6 12.4 39.85
2 3 Baltimore Ravens 16 282 ... 39 32.9 14.6 16.61
3 4 Chicago Bears 16 298 ... 30 31.5 10.7 -4.15
4 5 Minnesota Vikings 16 303 ... 31 34.5 17.0 -7.88
5 6 Pittsburgh Steelers 16 303 ... 30 29.9 19.0 85.78
6 7 Kansas City Chiefs 16 308 ... 39 34.6 13.6 -65.69
7 8 San Francisco 49ers 16 310 ... 30 29.0 14.2 77.41
8 9 Green Bay Packers 16 313 ... 20 34.5 14.1 -63.65
9 10 Denver Broncos 16 316 ... 34 37.3 8.4 -35.98
10 11 Dallas Cowboys 16 321 ... 38 35.5 9.9 -36.81
11 12 Tennessee Titans 16 331 ... 27 32.1 11.8 -54.20
12 13 New Orleans Saints 16 341 ... 43 34.7 12.7 -41.89
13 14 Los Angeles Chargers 16 345 ... 28 37.3 8.2 -86.11
14 15 Philadelphia Eagles 16 354 ... 28 33.9 10.2 -29.57
15 16 New York Jets 16 359 ... 40 34.4 10.1 -0.06
16 17 Los Angeles Rams 16 364 ... 30 33.7 12.7 -11.53
17 18 Indianapolis Colts 16 373 ... 23 39.3 13.1 -58.37
18 19 Houston Texans 16 385 ... 28 39.3 13.1 -160.87
19 20 Cleveland Browns 16 393 ... 37 36.9 11.2 -91.15
20 21 Jacksonville Jaguars 16 397 ... 33 37.4 9.2 -120.09
21 22 Seattle Seahawks 16 398 ... 25 37.1 16.3 -92.02
22 23 Atlanta Falcons 16 399 ... 30 42.8 9.0 -105.34
23 24 Oakland Raiders 16 419 ... 52 41.2 8.5 -159.71
24 25 Cincinnati Bengals 16 420 ... 21 39.8 8.8 -132.66
25 26 Detroit Lions 16 423 ... 39 40.1 9.0 -142.55
26 27 Washington Redskins 16 435 ... 34 41.9 12.2 -135.83
27 28 Arizona Cardinals 16 442 ... 38 42.6 9.5 -174.55
28 29 Tampa Bay Buccaneers 16 449 ... 39 39.6 13.5 12.23
29 30 New York Giants 16 451 ... 32 39.7 8.7 -105.11
30 31 Carolina Panthers 16 470 ... 30 41.4 9.4 -116.88
31 32 Miami Dolphins 16 494 ... 34 45.6 8.8 -175.02
32 NaN Avg Team NaN 365.0 ... 32.9 36.0 11.8 -56.6
33 NaN League Total NaN 11680 ... 1054 36.0 11.8 NaN
34 NaN Avg Tm/G NaN 22.8 ... 2.1 36.0 11.8 NaN
[35 rows x 28 columns]
Rk Tm G Cmp ... NY/A ANY/A Sk% EXP
0 1.0 San Francisco 49ers 16.0 318.0 ... 4.80 4.6 8.5 58.30
1 2.0 New England Patriots 16.0 303.0 ... 5.00 3.5 8.1 117.74
2 3.0 Pittsburgh Steelers 16.0 314.0 ... 5.50 4.7 9.5 20.19
3 4.0 Buffalo Bills 16.0 348.0 ... 5.20 4.7 7.4 30.01
4 5.0 Los Angeles Chargers 16.0 328.0 ... 6.50 6.3 6.1 -92.16
5 6.0 Baltimore Ravens 16.0 318.0 ... 5.70 5.2 6.4 15.40
6 7.0 Cleveland Browns 16.0 318.0 ... 6.30 6.1 6.9 -64.09
7 8.0 Kansas City Chiefs 16.0 352.0 ... 5.70 5.2 7.2 -36.78
8 9.0 Chicago Bears 16.0 362.0 ... 5.90 5.7 5.3 -47.04
9 10.0 Dallas Cowboys 16.0 370.0 ... 5.90 6.1 6.4 -67.46
10 11.0 Denver Broncos 16.0 348.0 ... 6.30 6.1 6.9 -61.45
11 12.0 Los Angeles Rams 16.0 348.0 ... 5.90 5.7 8.2 -42.76
12 13.0 Carolina Panthers 16.0 347.0 ... 6.20 5.8 8.9 -63.03
13 14.0 Green Bay Packers 16.0 326.0 ... 6.30 5.7 7.0 -27.30
14 15.0 Minnesota Vikings 16.0 394.0 ... 5.80 5.3 7.4 -34.01
15 16.0 Jacksonville Jaguars 16.0 327.0 ... 6.70 6.7 8.3 -98.77
16 17.0 New York Jets 16.0 363.0 ... 6.10 6.0 5.6 -79.16
17 18.0 Washington Redskins 16.0 371.0 ... 6.50 6.7 7.8 -135.17
18 19.0 Philadelphia Eagles 16.0 348.0 ... 6.30 6.4 7.0 -88.15
19 20.0 New Orleans Saints 16.0 371.0 ... 5.90 5.8 7.8 -94.59
20 21.0 Cincinnati Bengals 16.0 308.0 ... 7.40 7.4 5.8 -126.81
21 22.0 Atlanta Falcons 16.0 351.0 ... 6.90 7.0 5.0 -128.75
22 23.0 Indianapolis Colts 16.0 394.0 ... 6.60 6.4 6.8 -86.44
23 24.0 Tennessee Titans 16.0 386.0 ... 6.40 6.2 6.7 -92.39
24 25.0 Oakland Raiders 16.0 337.0 ... 7.40 7.8 5.7 -177.69
25 26.0 Miami Dolphins 16.0 344.0 ... 7.40 7.7 4.0 -172.01
26 27.0 Seattle Seahawks 16.0 383.0 ... 6.70 6.2 4.5 -77.18
27 28.0 New York Giants 16.0 369.0 ... 7.10 7.4 6.1 -152.48
28 29.0 Houston Texans 16.0 375.0 ... 6.90 7.1 5.0 -160.60
29 30.0 Tampa Bay Buccaneers 16.0 408.0 ... 6.10 6.2 6.6 -38.17
30 31.0 Arizona Cardinals 16.0 421.0 ... 7.00 7.7 6.2 -190.81
31 32.0 Detroit Lions 16.0 381.0 ... 7.10 7.7 4.4 -162.94
32 NaN Avg Team NaN 354.1 ... 6.29 6.2 6.7 -73.60
33 NaN League Total NaN 11331.0 ... 6.29 6.2 6.7 NaN
34 NaN Avg Tm/G NaN 22.1 ... 6.29 6.2 6.7 NaN
[35 rows x 25 columns]
Rk Tm G Att ... TD Y/A Y/G EXP
0 1.0 Tampa Bay Buccaneers 16.0 362.0 ... 11.0 3.3 73.8 56.23
1 2.0 New York Jets 16.0 417.0 ... 12.0 3.3 86.9 72.34
2 3.0 Philadelphia Eagles 16.0 353.0 ... 13.0 4.1 90.1 47.64
3 4.0 New Orleans Saints 16.0 345.0 ... 12.0 4.2 91.3 39.45
4 5.0 Baltimore Ravens 16.0 340.0 ... 12.0 4.4 93.4 -1.25
5 6.0 New England Patriots 16.0 365.0 ... 7.0 4.2 95.5 33.13
6 7.0 Indianapolis Colts 16.0 383.0 ... 8.0 4.1 97.9 21.54
7 8.0 Oakland Raiders 16.0 405.0 ... 15.0 3.9 98.1 17.69
8 9.0 Chicago Bears 16.0 414.0 ... 16.0 3.9 102.0 38.83
9 10.0 Buffalo Bills 16.0 388.0 ... 12.0 4.3 103.1 10.92
10 11.0 Dallas Cowboys 16.0 407.0 ... 14.0 4.1 103.5 25.11
11 12.0 Tennessee Titans 16.0 415.0 ... 14.0 4.0 104.5 28.27
12 13.0 Minnesota Vikings 16.0 404.0 ... 8.0 4.3 108.0 21.01
13 14.0 Pittsburgh Steelers 16.0 462.0 ... 7.0 3.8 109.6 63.09
14 15.0 Atlanta Falcons 16.0 421.0 ... 13.0 4.2 110.9 17.98
15 16.0 Denver Broncos 16.0 426.0 ... 9.0 4.2 111.4 12.72
16 17.0 San Francisco 49ers 16.0 401.0 ... 11.0 4.5 112.6 9.91
17 18.0 Los Angeles Chargers 16.0 429.0 ... 15.0 4.2 112.8 1.08
18 19.0 Los Angeles Rams 16.0 444.0 ... 15.0 4.1 113.1 21.49
19 20.0 New York Giants 16.0 469.0 ... 19.0 3.9 113.3 40.51
20 21.0 Detroit Lions 16.0 455.0 ... 13.0 4.1 115.9 17.32
21 22.0 Seattle Seahawks 16.0 388.0 ... 22.0 4.9 117.7 -17.45
22 23.0 Green Bay Packers 16.0 411.0 ... 15.0 4.7 120.1 -42.18
23 24.0 Arizona Cardinals 16.0 439.0 ... 9.0 4.4 120.1 15.13
24 25.0 Houston Texans 16.0 403.0 ... 12.0 4.8 121.1 -6.34
25 26.0 Kansas City Chiefs 16.0 416.0 ... 14.0 4.9 128.2 -41.35
26 27.0 Miami Dolphins 16.0 485.0 ... 15.0 4.5 135.4 -6.14
27 28.0 Jacksonville Jaguars 16.0 435.0 ... 23.0 5.1 139.3 -21.95
28 29.0 Carolina Panthers 16.0 445.0 ... 31.0 5.2 143.5 -62.69
29 30.0 Cleveland Browns 16.0 463.0 ... 19.0 5.0 144.7 -37.50
30 31.0 Washington Redskins 16.0 493.0 ... 14.0 4.7 146.2 -6.89
31 32.0 Cincinnati Bengals 16.0 504.0 ... 17.0 4.7 148.9 -12.07
32 NaN Avg Team NaN 418.3 ... 14.0 4.3 112.9 11.10
33 NaN League Total NaN 13387.0 ... 447.0 4.3 112.9 NaN
34 NaN Avg Tm/G NaN 26.1 ... 0.9 4.3 112.9 NaN
[35 rows x 9 columns]