使用Scrapy / Python仅在某些元素内进行抓取

时间:2018-10-10 19:59:56

标签: python scrapy screen-scraping

我正在尝试从this website抓取,特别是以下格式:[TEAM,PLAYER_NAME]。我也将使用response.xpath()。extract()单独进行处理。但是,我想知道是否有可能在该特定要素内进行抓取。

下面是我到目前为止的代码。我在想-流是刮擦每个客队,然后刮擦下面的球员。但是我找不到解决办法。当我运行extract()时,它将提取所有xpath。

#Extracting the number of games
games = response.xpath('//li[@data-role="lineup-card"]').extract()
num_of_games = len(games)

#generating loop to generate the lineups
while j < num_of_games:
    lineups[j] = []
    away_team = games[j][games[j].find(' data-away="')+12:games[j].find(' data-away="')+15]
    home_team = games[j][games[j].find(' data-home="')+12:games[j].find(' data-home="')+15]

    #scrape players for away_team
    players = response.xpath("//span[@class='pname']").extract()
    i = 0
    while i < len(players):
        name = players[i]
        name1 = name[name.find(' title="')+8:1000]
        name2 = name1[0:name1.find('"')]
        lineups.append(name2)
        i += 1
    j += 1
print(lineups)

0 个答案:

没有答案