我正在尝试从this website抓取,特别是以下格式:[TEAM,PLAYER_NAME]。我也将使用response.xpath()。extract()单独进行处理。但是,我想知道是否有可能在该特定要素内进行抓取。
下面是我到目前为止的代码。我在想-流是刮擦每个客队,然后刮擦下面的球员。但是我找不到解决办法。当我运行extract()时,它将提取所有xpath。
#Extracting the number of games
games = response.xpath('//li[@data-role="lineup-card"]').extract()
num_of_games = len(games)
#generating loop to generate the lineups
while j < num_of_games:
lineups[j] = []
away_team = games[j][games[j].find(' data-away="')+12:games[j].find(' data-away="')+15]
home_team = games[j][games[j].find(' data-home="')+12:games[j].find(' data-home="')+15]
#scrape players for away_team
players = response.xpath("//span[@class='pname']").extract()
i = 0
while i < len(players):
name = players[i]
name1 = name[name.find(' title="')+8:1000]
name2 = name1[0:name1.find('"')]
lineups.append(name2)
i += 1
j += 1
print(lineups)