如何使用scrapy在单个蜘蛛中抓取多个页面

时间:2015-03-05 03:11:38

标签: python web-scraping web-crawler scrapy

我需要从此页面获取每个产品的网址http://www.stalkbuylove.com/new-arrivals/week-2.html#/page/1 然后需要从产品链接中获取每个产品的详细信息。我不知道该怎么做。

import scrapy
import json
import redis

r_server = redis.Redis('localhost')


class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["stalkbuylove.com"]
    start_urls = [
        "http://www.stalkbuylove.com/new-arrivals/week-2.html#/page/1"
    ]

    def parse(self, response):
        for sel in response.css('.product-detail-slide'):
            name = sel.xpath('div/a/@title').extract()
            price = sel.xpath('div/span/span/text()').extract()
            productUrl = sel.xpath('div/a/@href').extract()
        request = scrapy.Request(''.join(productUrl), callback=self.parseProductPage)
        r_server.hset(name,"Name",name)
        r_server.hset(name,"Price",price)
        r_server.hset(name,"ProductUrl",productUrl)

        print name, price, productUrl

    def parseProductPage(self, response):
        for sel in response.css('.top-details-product'):    
            availability = sel.xpath('div/link/@href').extract()
            print availability

有人可以帮忙吗?当我得到产品网址时如何抓取该网址?现在我正在调用parseProductUrlPage,它不能正常工作。

0 个答案:

没有答案