因此,我正在尝试从transfermarkt页面为英超联赛中的所有球员收集不同类型的信息。
相关代码为:
# Create empty list for player link
playerLink1 = []
playerLink2 = []
playerLink3 = []
#For each team link page...
for i in range(len(Full_Links)):
#...Download the team page and process the html code...
squadPage = requests.get(Full_Links[i], headers=headers)
squadTree = squadPage.text
SquadSoup = BeautifulSoup(squadTree,'html.parser')
#...Extract the player links...
playerLocation = SquadSoup.find("div", {"class":"responsive-table"}).find_all("a",{"class":"spielprofil_tooltip"})
for a in playerLocation:
playerLink1.append(a['href'])
[playerLink2.append(x) for x in playerLink1 if x not in playerLink2]
#...For each player link within the team page...
for j in range(len(playerLink2)):
#...Save the link, complete with domain...
temp2 = "https://www.transfermarkt.co.uk" + playerLink2[j]
#...Add the finished link to our teamLinks list...
playerLink3.append(temp2)
#Populate lists with each player
#For each player...
for i in range(len(playerLink3_u)):
#...download and process the two pages collected earlier...
playerPage = requests.get(playerLink3_u[i], headers = headers)
playerTree = playerPage.text
PlayerSoup = BeautifulSoup(playerTree,'html.parser')
#...find the relevant datapoint for each player, starting with name...
tempName = PlayerSoup.find("div", {"class":"spielerdaten "}).find_all("a",{"class":"spielprofil_tooltip"})
问题在于,在最后一行“ tempName”(这是错误的)中,我没有任何类来查找足球运动员的姓名。
这是玩家https://www.transfermarkt.co.uk/ederson/profil/spieler/238223
的链接关于如何从此HTML代码提取数据的任何提示,因为除了名称之外,我还需要从同一位置获取更多数据?
答案 0 :(得分:1)
页面是动态的,并在初始请求后呈现。您必须通过api(如果可用)访问数据,或使用浏览器模拟(如Selenium)打开页面,进行渲染,然后拉出html:
import pandas as pd
from selenium import webdriver
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
playerPage = driver.get('https://www.transfermarkt.co.uk/ederson/profil/spieler/238223')
df = pd.read_html(driver.page_source)[0]
输出:
print (df.to_string())
0 1
0 Full name: Ederson Santana de Moraes
1 Date of birth: Aug 17, 1993
2 Place of birth: Osasco (SP)
3 Age: 26
4 Height: 1,88 m
5 Citizenship: Brazil Portugal
6 Position: Goalkeeper
7 Foot: left
8 Player agent: Gestifute
9 Current club: Manchester City
10 Joined: Jul 1, 2017
11 Contract expires: 30.06.2025
12 Date of last contract extension: May 13, 2018
13 Outfitter: Nike
14 Social media: NaN
答案 1 :(得分:0)
我不知道这是否是针对您的情况的真正解决方案,但也许您可以使用元素的xpath而不是它的类。 Xpath是HTML代码到特定元素的路径。因此,如果播放器的名称在每个页面中都位于HTML脚本的相同位置,那么您可以每次都删除该元素
要在Firefox中查找xpath,必须在检查器模式下找到该元素,右键单击它->复制-> Xpath