我一直在努力取消那些为高中足球运动员提供奖学金的学校,但我遇到了一些问题。
以下是示例页面:https://n.rivals.com/content/prospects/2021/de-javion-stepney-235539#school-interests
表格扩展后,我已经可以刮擦所有学校的名称,但我只想刮擦与学校位于同一行中带有学校复选标记的学校。我该怎么办?
此外,尽管我可以抓取学校的名称,但它经常会在进入下一个播放器页面之前重复随机行,而我不知道为什么。
这是我到目前为止所拥有的:
Offered_By_List = []
for s in driver.find_elements_by_class_name('school-logo-name'):
Offered_By_List.append(s)
任何帮助将不胜感激,被困在这上面了!
答案 0 :(得分:1)
您可以使用xpaths在表中的选中标记和行之间实现关系,示例中的xpath示例下面将获取带有选中标记的行。您会注意到,该xpath仅选择了带有复选标记的行(此页面为15)。然后将其保存为数组,并遍历所有行并保存学校名称。
//tbody/tr[td[5]/div[@class="checkmark ng-scope"]]
或直接使用下面的代码
list = browser.find_elements_by_xpath("//tbody/tr[td[5]/div[@class="checkmark ng-scope"]]/td[1]/div/*[@class="ng-binding ng-scope"]")
for s in list:
print(s.text)
答案 1 :(得分:1)
使用ancestor
,这只是为了抄写学校名称:
driver.find_elements_by_xpath('//div[@class="checkmark ng-scope"]//ancestor::tr//div[@class="school-logo-name"]')
但是,如果要抓取每一行的所有数据,只需删除上面xpath中的//div[@class="school-logo-name"]
。
答案 2 :(得分:1)
我不会在这里使用Selenium,因为数据在HTML中作为元素属性中的有效json返回。有几种方法可以提取学校名称,但是我也用熊猫完成了它,因为您也可以将其放入表格中,并且如果您想要的数据不仅仅是学校名称,您可以按自己的意愿进行操作。我也同时获取了玩家资料:
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
url = 'https://n.rivals.com/content/prospects/2021/de-javion-stepney-235539#school-interests'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
jsonStr = soup.find('rv-user-forecast-banner')['prospect']
playerData = json.loads(jsonStr)
df1 = pd.DataFrame(playerData)
print (df1)
jsonStr = soup.find('rv-prospect-school-interests')['data']
schoolIntData = json.loads(jsonStr)
df2 = pd.DataFrame(schoolIntData)
print (df2)
schoolsOffered = df2[df2['offer'] == True]
Offered_By_List = list(schoolsOffered['team_name'])
输出:
print(df2.to_string())
college_id commit commit_date commitments_url id interest offer recruiters sign site_id site_name team_logo team_name visits
0 51 True 2020-04-27 //centralmichigan.rivals.com/commitments/footb... 779864 HIGH True [] False 21.0 centralmichigan https://s.yimg.com/xe/ipt/CentralMichiganChipp... Central Michigan []
1 48 False None None 794033 NONE True [] False NaN None https://s.yimg.com/dh/ap/default/151102/akron-... Akron []
2 49 False None None 850677 NONE True [] False NaN None https://s.yimg.com/cv/ae/default/170622/Ball-S... Ball State []
3 11 False None //bostoncollege.rivals.com/commitments/footbal... 817746 NONE True [] False 15.0 bostoncollege https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20... Boston College []
4 50 False None None 835972 NONE True [] False NaN None https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20... Bowling Green []
5 209 False None None 835973 NONE True [] False NaN None https://sp.yimg.com/j/assets/ipt/BuffaloBulls.png Buffalo []
6 99 False None //cincinnati.rivals.com/commitments/football/2021 833595 NONE True [] False 23.0 cincinnati https://s.yimg.com/xe/ipt/CINC_300.png Cincinnati []
7 28 False None //Indiana.rivals.com/commitments/football/2021 825797 NONE True [] False 51.0 Indiana https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20... Indiana []
8 20 False None //iowastate.rivals.com/commitments/football/2021 836116 NONE True [] False 55.0 iowastate https://s.yimg.com/dh/ap/default/151102/IowaSt... Iowa State []
9 53 False None //kentstate.rivals.com/commitments/football/2021 850678 NONE True [] False 62.0 kentstate https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20... Kent State []
10 54 False None None 850676 NONE True [] False NaN None https://s.yimg.com/cv/ae/default/170623/Miami-... Miami (OH) []
11 15 False None //syracuse.rivals.com/commitments/football/2021 804815 NONE True [] False 133.0 syracuse https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20... Syracuse []
12 16 False None //temple.rivals.com/commitments/football/2021 826633 NONE True [] False 135.0 temple https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20... Temple []
13 56 False None //toledo.rivals.com/commitments/football/2021 783815 NONE True [] False 144.0 toledo https://s.yimg.com/dh/ap/default/160427/Toledo... Toledo []
14 57 False None //westernmichigan.rivals.com/commitments/footb... 783814 NONE True [] False 168.0 westernmichigan https://s.yimg.com/dh/ap/default/170213/wm_nca... Western Michigan []
15 27 False None //Illinois.rivals.com/commitments/football/2021 783816 NONE False [] False 49.0 Illinois https://s.yimg.com/xe/ipt/illinois_300.png Illinois []
16 72 False None //Tennessee.rivals.com/commitments/football/2021 856404 NONE False [] False 136.0 Tennessee https://s.yimg.com/dh/ap/default/151102/Tennes... Tennessee []
17 63 False None //USC.rivals.com/commitments/football/2021 856403 NONE False [] False 151.0 USC https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20... USC []
还有列表:
print(Offered_By_List)
['Central Michigan', 'Akron', 'Ball State', 'Boston College', 'Bowling Green', 'Buffalo', 'Cincinnati', 'Indiana', 'Iowa State', 'Kent State', 'Miami (OH)', 'Syracuse', 'Temple', 'Toledo', 'Western Michigan']