我试图刮伤:http://www.wtatennis.com/stats。但是当我完成完整的代码时,我遇到了一个错误。我可能会开始对此做多,但我没有看到错误,因此无法解决。
import requests, re
from bs4 import BeautifulSoup
r=requests.get("http://www.wtatennis.com/stats")
c=r.content
soup=BeautifulSoup(c,"html.parser")
all=soup.find_all("div",{"class":"view-content"})
#find the results, names, scores
for classes in all:
position = classes.find_all('td',{"class":"views-field views-field-counter views-align-center"})[0].text
wta_name = classes.find_all('td',{"class":"views-field views-field-field-lastname views-align-left"})[0].text
current_ranking = classes.find_all('td',{"class":"views-field views-field-field-current-rank views-align-center"})[0].text
match_count = classes.find_all('td',{"class":"views-field views-field-field-matchcount views-align-center"})[0].text
aces_count = classes.find_all('td',{"class":"views-field views-field-field-aces active views-align-center"})[0].text
df_count = classes.find_all('td',{"class":"views-field views-field-field-double-faults views-align-center"})[0].text
firstserver_perc = classes.find_all('td',{"class":"views-field views-field-field-first-serve-percent views-align-center"})[0].text
firstservewon_perc = classes.find_all('td',{"class":"views-field views-field-field-first-serve-won-percent views-align-center"})[0].text
secondservewon_perc = classes.find_all('td',{"class":"views-field views-field-field-second-serve-won-percent views-align-center"})[0].text
print (position)
print (wta_name)
print (current_ranking)
print (match_count)
print (aces_count)
print (df_count)
print (firstserver_perc)
print (firstservewon_perc)
print (secondservewon_perc)
结果
1 Goerges,朱莉娅(GER) 12 7 61 25 59.8% 76.0%
52.4%
IndexError Traceback (most recent call last)
<ipython-input-6-fabdb2904a0b> in <module>()
18 current_ranking = classes.find_all('td',{"class":"views-field views-field-field-current-rank views-align-center"})[0].text
19 match_count = classes.find_all('td',{"class":"views-field views-field-field-matchcount views-align-center"})[0].text
---> 20 aces_count = classes.find_all('td',{"class":"views-field views-field-field-aces active views-align-center"})[0].text
21 df_count = classes.find_all('td',{"class":"views-field views-field-field-double-faults views-align-center"})[0].text
22 firstserver_perc = classes.find_all('td',{"class":"views-field views-field-field-first-serve-percent views-align-center"})[0].text
IndexError: list index out of range
答案 0 :(得分:0)
以下是我在您的代码中发现的问题:
all=soup.find_all("div",{"class":"view-content"})
正在使用find_all,这是错误的,因为有多个div
标记带有类view-content
。我更改了此行以使用find()
函数而不是find_all()
函数另请注意,我删除了代码中导入的re
库,因为它不需要。
以下是我对您的问题的尝试:
import requests
from bs4 import BeautifulSoup
c = requests.get("http://www.wtatennis.com/stats").text
soup = BeautifulSoup(c, "html.parser")
c = soup.find("div", {"class": "view-content"})
position = c.find_all('td', {"class": "views-field views-field-counter views-align-center"})
wta_name = c.find_all('td', {"class": "views-field views-field-field-lastname views-align-left"})
current_ranking = c.find_all('td', {"class": "views-field views-field-field-current-rank views-align-center"})
match_count = c.find_all('td', {"class": "views-field views-field-field-matchcount views-align-center"})
aces_count = c.find_all('td', {"class": "views-field views-field-field-aces active views-align-center"})
df_count = c.find_all('td', {"class": "views-field views-field-field-double-faults views-align-center"})
firstserver_perc = c.find_all('td', {"class": "views-field views-field-field-first-serve-percent views-align-center"})
firstservewon_perc = c.find_all('td', {"class": "views-field views-field-field-first-serve-won-percent views-align-center"})
secondservewon_perc = c.find_all('td', {"class": "views-field views-field-field-second-serve-won-percent views-align-center"})
for i in range(0, len(position)):
print(position[i].text)
print(wta_name[i].text)
print(current_ranking[i].text)
print(match_count[i].text)
print(aces_count[i].text)
print(df_count[i].text)
print(firstserver_perc[i].text)
print(firstservewon_perc[i].text)
print(secondservewon_perc[i].text)
print("***************")
输出:
1
Goerges, Julia (GER)
12
7
61
25
59.8 %
76.0 %
52.4 %
***************
2
Svitolina, Elina (UKR)
3
10
60
13
60.1 %
72.2 %
47.5 %
***************
3
Wozniacki, Caroline (DEN)
1
12
58
37
64.3 %
71.9 %
50.3 %
***************
4
Pliskova, Karolina (CZE)
5
8
53
19
63.9 %
71.6 %
47.7 %
***************
5
Barty, Ashleigh (AUS)
16
9
50
27
61.0 %
67.7 %
53.6 %
***************
6
Mertens, Elise (BEL)
20
10
43
35
65.8 %
69.1 %
46.9 %
***************
7
Siniakova, Katerina (CZE)
52
8
39
31
61.2 %
65.5 %
46.5 %
***************
8
Osaka, Naomi (JPN)
53
5
38
11
62.5 %
69.4 %
44.8 %
***************
9
Pliskova, Kristyna (CZE)
78
5
38
17
59.3 %
70.4 %
41.3 %
***************
10
Keys, Madison (USA)
14
6
37
17
61.1 %
73.9 %
46.8 %
***************
11
Bertens, Kiki (NED)
28
6
35
26
61.2 %
70.1 %
39.6 %
***************
12
Sevastova, Anastasija (LAT)
15
7
34
11
60.2 %
71.4 %
47.7 %
***************
13
Konta, Johanna (GBR)
11
6
31
22
65.6 %
66.1 %
50.0 %
***************
14
Halep, Simona (ROU)
2
12
30
27
66.1 %
68.2 %
50.3 %
***************
15
Kontaveit, Anett (EST)
27
6
29
32
63.9 %
67.3 %
48.3 %
***************
16
Strycova, Barbora (CZE)
24
10
29
25
65.6 %
64.4 %
46.7 %
***************
17
Giorgi, Camila (ITA)
63
7
26
27
59.3 %
65.8 %
48.2 %
***************
18
Sharapova, Maria (RUS)
41
7
26
36
60.0 %
70.0 %
48.0 %
***************
19
Kanepi, Kaia (EST)
66
6
25
24
56.8 %
64.3 %
50.3 %
***************
20
Watson, Heather (GBR)
75
6
25
17
62.2 %
65.0 %
50.7 %
***************