如何使用ItemLoader的add_xpath方法进行索引编制

时间:2016-10-02 14:55:44

标签: python xpath scrapy

我试图重写这段代码以使用ItemLoader类:

import scrapy

from ..items import Book


class BasicSpider(scrapy.Spider):
    ...
    def parse(self, response):
        item = Book()

        # notice I only grab the first book among many there are on the page             
        item['title'] = response.xpath('//*[@class="link linkWithHash detailsLink"]/@title')[0].extract()
        return item

以上效果非常好。现在与ItemLoader

相同
from scrapy.loader import ItemLoader

class BasicSpider(scrapy.Spider):
    ...    
    def parse(self, response):
        l = ItemLoader(item=Book(), response=response)

        l.add_xpath('title', '//*[@class="link linkWithHash detailsLink"]/@title'[0])  # this does not work - returns an empty dict
        # l.add_xpath('title', '//*[@class="link linkWithHash detailsLink"]/@title')  # this of course work but returns every book title there is on page, not just the first one which is required
        return l.load_item()

所以我只想获得第一本书名,我该如何实现呢?

1 个答案:

答案 0 :(得分:0)

您的代码存在的问题是Xpath使用基于一的索引。另一个问题是索引括号应该在您传递给add_xpath方法的字符串内。

所以正确的代码看起来像这样:

l.add_xpath('title', '(//*[@class="link linkWithHash detailsLink"]/@title)[1]')