刮痧问题:Beatifulsoup; IndexError:列表索引超出范围

时间:2018-01-31 17:22:44

标签: python-3.x web-scraping beautifulsoup python-requests

我试图刮伤:http://www.wtatennis.com/stats。但是当我完成完整的代码时,我遇到了一个错误。我可能会开始对此做多,但我没有看到错误,因此无法解决。

import requests, re
from bs4 import BeautifulSoup

r=requests.get("http://www.wtatennis.com/stats")
c=r.content

soup=BeautifulSoup(c,"html.parser")

all=soup.find_all("div",{"class":"view-content"})

#find the results, names, scores

for classes in all:

        position = classes.find_all('td',{"class":"views-field views-field-counter views-align-center"})[0].text
        wta_name = classes.find_all('td',{"class":"views-field views-field-field-lastname views-align-left"})[0].text
        current_ranking = classes.find_all('td',{"class":"views-field views-field-field-current-rank views-align-center"})[0].text
        match_count = classes.find_all('td',{"class":"views-field views-field-field-matchcount views-align-center"})[0].text
        aces_count = classes.find_all('td',{"class":"views-field views-field-field-aces active views-align-center"})[0].text
        df_count = classes.find_all('td',{"class":"views-field views-field-field-double-faults views-align-center"})[0].text
        firstserver_perc = classes.find_all('td',{"class":"views-field views-field-field-first-serve-percent views-align-center"})[0].text
        firstservewon_perc = classes.find_all('td',{"class":"views-field views-field-field-first-serve-won-percent views-align-center"})[0].text
        secondservewon_perc = classes.find_all('td',{"class":"views-field views-field-field-second-serve-won-percent views-align-center"})[0].text


        print (position)
        print (wta_name)
        print (current_ranking)
        print (match_count)
        print (aces_count)
        print (df_count)
        print (firstserver_perc)
        print (firstservewon_perc)
        print (secondservewon_perc)

结果

  

1    Goerges,朱莉娅(GER)   12   7   61   25    59.8%    76.0%

     

52.4%

IndexError                                Traceback (most recent call last)
<ipython-input-6-fabdb2904a0b> in <module>()
     18         current_ranking = classes.find_all('td',{"class":"views-field views-field-field-current-rank views-align-center"})[0].text
     19         match_count = classes.find_all('td',{"class":"views-field views-field-field-matchcount views-align-center"})[0].text
---> 20         aces_count = classes.find_all('td',{"class":"views-field views-field-field-aces active views-align-center"})[0].text
     21         df_count = classes.find_all('td',{"class":"views-field views-field-field-double-faults views-align-center"})[0].text
     22         firstserver_perc = classes.find_all('td',{"class":"views-field views-field-field-first-serve-percent views-align-center"})[0].text

IndexError: list index out of range

1 个答案:

答案 0 :(得分:0)

以下是我在您的代码中发现的问题:

  1. all=soup.find_all("div",{"class":"view-content"})正在使用find_all,这是错误的,因为有多个div标记带有类view-content。我更改了此行以使用find()函数而不是find_all()函数
  2. 修复上一个要点中陈述的问题后,您将在打印区域明显出现问题(未获取所有数据,只是您要解析的表格的第一条记录)。
  3. 另请注意,我删除了代码中导入的re库,因为它不需要。

    以下是我对您的问题的尝试:

    import requests
    from bs4 import BeautifulSoup
    
    c = requests.get("http://www.wtatennis.com/stats").text
    soup = BeautifulSoup(c, "html.parser")
    c = soup.find("div", {"class": "view-content"})
    
    position = c.find_all('td', {"class": "views-field views-field-counter views-align-center"})
    wta_name = c.find_all('td', {"class": "views-field views-field-field-lastname views-align-left"})
    current_ranking = c.find_all('td', {"class": "views-field views-field-field-current-rank views-align-center"})
    match_count = c.find_all('td', {"class": "views-field views-field-field-matchcount views-align-center"})
    aces_count = c.find_all('td', {"class": "views-field views-field-field-aces active views-align-center"})
    df_count = c.find_all('td', {"class": "views-field views-field-field-double-faults views-align-center"})
    firstserver_perc = c.find_all('td', {"class": "views-field views-field-field-first-serve-percent views-align-center"})
    firstservewon_perc = c.find_all('td', {"class": "views-field views-field-field-first-serve-won-percent views-align-center"})
    secondservewon_perc = c.find_all('td', {"class": "views-field views-field-field-second-serve-won-percent views-align-center"})
    
    for i in range(0, len(position)):
        print(position[i].text)
        print(wta_name[i].text)
        print(current_ranking[i].text)
        print(match_count[i].text)
        print(aces_count[i].text)
        print(df_count[i].text)
        print(firstserver_perc[i].text)
        print(firstservewon_perc[i].text)
        print(secondservewon_perc[i].text)
        print("***************")
    

    输出:

     1
     Goerges, Julia (GER)
    12
    7
    61
    25
     59.8 %
     76.0 %
     52.4 %
    ***************
     2
     Svitolina, Elina (UKR)
    3
    10
    60
    13
     60.1 %
     72.2 %
     47.5 %
    ***************
     3
     Wozniacki, Caroline (DEN)
    1
    12
    58
    37
     64.3 %
     71.9 %
     50.3 %
    ***************
     4
     Pliskova, Karolina (CZE)
    5
    8
    53
    19
     63.9 %
     71.6 %
     47.7 %
    ***************
     5
     Barty, Ashleigh (AUS)
    16
    9
    50
    27
     61.0 %
     67.7 %
     53.6 %
    ***************
     6
     Mertens, Elise (BEL)
    20
    10
    43
    35
     65.8 %
     69.1 %
     46.9 %
    ***************
     7
     Siniakova, Katerina (CZE)
    52
    8
    39
    31
     61.2 %
     65.5 %
     46.5 %
    ***************
     8
     Osaka, Naomi (JPN)
    53
    5
    38
    11
     62.5 %
     69.4 %
     44.8 %
    ***************
     9
     Pliskova, Kristyna (CZE)
    78
    5
    38
    17
     59.3 %
     70.4 %
     41.3 %
    ***************
     10
     Keys, Madison (USA)
    14
    6
    37
    17
     61.1 %
     73.9 %
     46.8 %
    ***************
     11
     Bertens, Kiki (NED)
    28
    6
    35
    26
     61.2 %
     70.1 %
     39.6 %
    ***************
     12
     Sevastova, Anastasija (LAT)
    15
    7
    34
    11
     60.2 %
     71.4 %
     47.7 %
    ***************
     13
     Konta, Johanna (GBR)
    11
    6
    31
    22
     65.6 %
     66.1 %
     50.0 %
    ***************
     14
     Halep, Simona (ROU)
    2
    12
    30
    27
     66.1 %
     68.2 %
     50.3 %
    ***************
     15
     Kontaveit, Anett (EST)
    27
    6
    29
    32
     63.9 %
     67.3 %
     48.3 %
    ***************
     16
     Strycova, Barbora (CZE)
    24
    10
    29
    25
     65.6 %
     64.4 %
     46.7 %
    ***************
     17
     Giorgi, Camila (ITA)
    63
    7
    26
    27
     59.3 %
     65.8 %
     48.2 %
    ***************
     18
     Sharapova, Maria (RUS)
    41
    7
    26
    36
     60.0 %
     70.0 %
     48.0 %
    ***************
     19
     Kanepi, Kaia (EST)
    66
    6
    25
    24
     56.8 %
     64.3 %
     50.3 %
    ***************
     20
     Watson, Heather (GBR)
    75
    6
    25
    17
     62.2 %
     65.0 %
     50.7 %
    ***************