我正试图从NFL网站上抓一张桌子,但是一直存在错误,并且不知道我做错了什么。
我使用的代码是:
import pandas
import urllib2
#specify the url
NFLpage = "http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2"
#Query the website and return the html to the variable 'page'
page = urllib2.urlopen(NFLpage)
#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup
#Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page)
print soup.prettify(page)
#Find the right table
all_tables=soup.find_all('table')
right_table=soup.find('table', class_='tablehead')
right_table
for row in right_table.findAll("tr"):
col = row.find_all('td')
column_1 = col[0].string.strip()
RK.append(column_1)
column_2 = col[1].string.strip()
PLAYER.append(column_2)
column_3 = col[2].string.strip()
TEAM.append(column_3)
column_4 = col[3].string.strip()
GP.append(column_4)
column_5 = col[4].string.strip()
G1.append(column_5)
column_6 = col[5].string.strip()
A1.append(column_6)
column_7 = col[6].string.strip()
PTS.append(column_7)
column_8 = col[7].string.strip()
Diff.append(column_8)
column_9 = col[8].string.strip()
PIM.append(column_9)
column_10 = col[9].string.strip()
PTSG.append(column_10)
column_11 = col[10].string.strip()
SOG.append(column_11)
column_12 = col[11].string.strip()
PCT.append(column_12)
column_13 = col[12].string.strip()
GWG.append(column_13)
column_14 = col[13].string.strip()
G2.append(column_14)
column_15 = col[14].string.strip()
A2.append(column_15)
column_16 = col[15].string.strip()
G3.append(column_16)
column_17 = col[15].string.strip()
A3.append(column_17)
columns = {'RK': RK, 'PLAYER':PLAYER, 'TEAM'=TEAM, 'GP': GP, 'G1': G1, 'A1': A1, 'PTS': PTS, 'Diff'=Diff, 'PIM'=PIM, 'PTSG'=PTSG, 'SOG'=SOG, 'PCT'=PCT, 'GWG'=GWG, 'G2'=G2, 'A2'=A2, 'G3'=G3,'A3'=A3}
df = pd.DataFrame(columns)
df
目前在列分配行上收到错误(从结尾开始的第3个)。你能帮我看看我做错了吗?
干杯, Andreia
答案 0 :(得分:1)
pandas
可以从网址读取表格,您可以参考Document
import pandas as pd
pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2')
出:
[ 0 1 2 3 4 5 6 7 8 9 \
0 NaN PP SH NaN NaN NaN NaN NaN NaN NaN
1 RK PLAYER TEAM GP G A PTS +/- PIM PTS/G
2 1 Jamie Benn, LW DAL 82 35 52 87 1 64 1.06
3 2 John Tavares, C NYI 82 38 48 86 5 46 1.05
4 3 Sidney Crosby, C PIT 77 28 56 84 5 47 1.09
5 4 Alex Ovechkin, LW WSH 81 53 28 81 10 58 1.00
6 NaN Jakub Voracek, RW PHI 82 22 59 81 1 78 0.99
7 6 Nicklas Backstrom, C WSH 82 18 60 78 5 40 0.95
8 7 Tyler Seguin, C DAL 71 37 40 77 -1 20 1.08
9 8 Jiri Hudler, LW CGY 78 31 45 76 17 14 0.97
10 NaN Daniel Sedin, LW VAN 82 20 56 76 5 18 0.93