Scrapy从XPath返回空数组

时间:2018-06-26 19:36:53

标签: python python-3.x xpath scrapy web-crawler

我正在尝试从以下网页收集运动员的数据:https://www.athletic.net/TrackAndField/Athlete.aspx?AID=7844096#!/L4。我已经能够收集运动员的名字,但是很难用相同的方法收集他们的学校名字。我知道学校名称以文本形式包含在块内的链接中,但它仅返回一个空数组。

这是我的代码:

import scrapy

class AthletesSpider(scrapy.Spider):
    name = 'athletes'
    allowed_domains = ['athletic.net']
    start_urls = ['https://www.athletic.net/TrackAndField/Athlete.aspx?AID=7844096#!/L0']

    def parse(self, response):
        yield {
            'athlete_name' : response.xpath("//h2/text()").extract_first(),
            'school_name' : response.xpath("//h1/a/text()").extract_first()
        }

我想念什么吗?

1 个答案:

答案 0 :(得分:2)

在字典中添加逗号

import scrapy

class AthletesSpider(scrapy.Spider):
    name = 'athletes'
    allowed_domains = ['athletic.net']
    start_urls = ['https://www.athletic.net/TrackAndField/Athlete.aspx?AID=7844096#!/L0']

    def parse(self, response):
        yield {
            'athlete_name' : response.xpath("//h2/text()").extract_first(), <--here
            'school_name' : response.xpath("//h1/a/text()").extract_first()
        }