将lxml转换为scrapy xxs选择器

时间:2013-07-25 15:06:19

标签: python xml screen-scraping scrapy lxml

如何将这个纯python lxml转换为内置xxs选择器的scrapy?这个可以工作,但我想将其转换为scrapy xxs选择器。

    def parse_device_list(self, response):
    self.log("\n\n\n List of devices \n\n\n")
    self.log('Hi, this is the parse_device_list page! %s' % response.url)
    root = lxml.etree.fromstring(response.body)
    for row in root.xpath('//row'):
        allcells = row.xpath('./cell')
        # first cell contain the link to follow
        detail_page_link = allcells[0].get("href")
        yield Request(urlparse.urljoin(response.url, detail_page_link ), callback=self.parse_page)

1 个答案:

答案 0 :(得分:0)

试一试:

def parse_page(self, response):
    xxs = XmlXPathSelector(response)
    for row in xxs.select('//row'):
        detail_page_link = row.select('.//cell[1]/@href')[0].extract()
        yield Request(urlparse.urljoin(response.url, detail_page_link), callback=self.parse_page)