从网站使用beautifulsoup刮表,错误到底

时间:2017-01-19 15:53:30

标签: python beautifulsoup scrape

我正试图从NFL网站上抓一张桌子,但是一直存在错误,并且不知道我做错了什么。

我使用的代码是:

import pandas
import urllib2

#specify the url
NFLpage = "http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2"

#Query the website and return the html to the variable 'page'
page = urllib2.urlopen(NFLpage)

#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup

#Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page)

print soup.prettify(page)


#Find the right table
all_tables=soup.find_all('table')
right_table=soup.find('table', class_='tablehead')
right_table 

for row in right_table.findAll("tr"):

    col = row.find_all('td')

    column_1 = col[0].string.strip()
    RK.append(column_1)

    column_2 = col[1].string.strip()
    PLAYER.append(column_2)

    column_3 = col[2].string.strip()
    TEAM.append(column_3)

    column_4 = col[3].string.strip()
    GP.append(column_4)

    column_5 = col[4].string.strip()
    G1.append(column_5)

    column_6 = col[5].string.strip()
    A1.append(column_6)

    column_7 = col[6].string.strip()
    PTS.append(column_7)

    column_8 = col[7].string.strip()
    Diff.append(column_8)

    column_9 = col[8].string.strip()
    PIM.append(column_9)

    column_10 = col[9].string.strip()
    PTSG.append(column_10)

    column_11 = col[10].string.strip()
    SOG.append(column_11)

    column_12 = col[11].string.strip()
    PCT.append(column_12)

    column_13 = col[12].string.strip()
    GWG.append(column_13)


    column_14 = col[13].string.strip()
    G2.append(column_14)

    column_15 = col[14].string.strip()
    A2.append(column_15)

    column_16 = col[15].string.strip()
    G3.append(column_16)

    column_17 = col[15].string.strip()
    A3.append(column_17)


columns = {'RK': RK, 'PLAYER':PLAYER, 'TEAM'=TEAM, 'GP': GP, 'G1': G1, 'A1': A1, 'PTS': PTS, 'Diff'=Diff, 'PIM'=PIM, 'PTSG'=PTSG, 'SOG'=SOG, 'PCT'=PCT, 'GWG'=GWG, 'G2'=G2, 'A2'=A2, 'G3'=G3,'A3'=A3}

df = pd.DataFrame(columns)

df

目前在列分配行上收到错误(从结尾开始的第3个)。你能帮我看看我做错了吗?

干杯, Andreia

1 个答案:

答案 0 :(得分:1)

pandas可以从网址读取表格,您可以参考Document

import pandas as pd

pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2')

出:

[     0                       1     2    3    4    5    6    7    8      9   \
 0   NaN                      PP    SH  NaN  NaN  NaN  NaN  NaN  NaN    NaN   
 1    RK                  PLAYER  TEAM   GP    G    A  PTS  +/-  PIM  PTS/G   
 2     1          Jamie Benn, LW   DAL   82   35   52   87    1   64   1.06   
 3     2         John Tavares, C   NYI   82   38   48   86    5   46   1.05   
 4     3        Sidney Crosby, C   PIT   77   28   56   84    5   47   1.09   
 5     4       Alex Ovechkin, LW   WSH   81   53   28   81   10   58   1.00   
 6   NaN       Jakub Voracek, RW   PHI   82   22   59   81    1   78   0.99   
 7     6    Nicklas Backstrom, C   WSH   82   18   60   78    5   40   0.95   
 8     7         Tyler Seguin, C   DAL   71   37   40   77   -1   20   1.08   
 9     8         Jiri Hudler, LW   CGY   78   31   45   76   17   14   0.97   
 10  NaN        Daniel Sedin, LW   VAN   82   20   56   76    5   18   0.93