Question

我正在尝试抓取MichaelKors.com。到目前为止，我已经取得了成功；我的脚本刚刚停止工作。没有触发回调函数。我已经从我的函数中删除了所有内容，即使那时它们也没有被调用。这是我的代码：

class MichaelKorsClass(CrawlSpider):
    name = 'michaelkors'
    allowed_domains = ['www.michaelkors.com']
    start_urls = ['https://www.michaelkors.com/women/clothing/dresses/_/N-28ei' ]
    rules = (
        # Rule(LinkExtractor(allow=('(.*\/_\/R-\w\w_)([\-a-zA-Z0-9]*)$', ), deny=('((.*investors.*)|(/info/)|(contact\-us)|(checkout))',   )), callback='parse_product'),
        Rule(LinkExtractor(allow=('(.*\/_\/)(N-[\-a-zA-Z0-9]*)$',),
                           deny=('((.*investors.*)|(/info/)|(contact\-us)|(checkout) | (gifts))',),), callback='parse_list'),
    )



    def parse_product(self, response):
        self.log("HIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII")



def parse_list(self, response):
        hxs = HtmlXPathSelector(response)
        url = response.url
        self.log("Helloww")
        is_listing_page = False
        product_count = hxs.select('//span[@class="product-count"]/text()').get()
        #print(re.findall('\d+', pc))


        try:
            product_count = int(product_count)
            is_listing_page = True
        except:
            is_listing_page = False
        if is_listing_page:
            for product_url in response.xpath('//ul[@class="product-wrapper product-wrapper-four-tile"]//li[@class="product-name-container"]/a/@href').getall():
                yield scrapy.Request(response.urljoin(product_url), callback=self.parse_product)

这是日志：

2019-07-29 11:25:50 [scrapy.core.engine] INFO: Spider opened
2019-07-29 11:25:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-07-29 11:25:50 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-07-29 11:25:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.michaelkors.com/women/clothing/dresses/_/N-28ei> (referer: None)
2019-07-29 11:25:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.michaelkors.com/sale/view-all-sale/_/N-28zn> (referer: https://www.michaelkors.com/women/clothing/dresses/_/N-28ei)
2019-07-29 11:25:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.michaelkors.com/women/clothing/jumpsuits/_/N-18bkjwa> (referer: https://www.michaelkors.com/women/clothing/dresses/_/N-28ei)
2019-07-29 11:25:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.michaelkors.com/women/clothing/t-shirts-sweatshirts/_/N-10dkew5> (referer: https://www.michaelkors.com/women/clothing/dresses/_/N-28ei)
....

Hellow和Hiiii均未打印

回调功能未触发

0 个答案: