Question

在生成项目后，抓痒不会停止。

通常我会一次刮两页。第一个用于Take链接，第二个用于内容。

但是现在我必须抓取第三页。

是否有可能拦截它不再刮擦的事实？

def parse(self, response):
    self.logger.info('response.url %s', response.url)

    while True:
        try:
            posts = response.xpath(self.ITEM_RESULT)
            num = 0

            for post in posts:
                item = SpiderItem()
                # self.logger.info('post ciclo %s', post.xpath(self.URL_CSS).extract()[num])
                item['url'] = sito + post.xpath(self.URL_CSS).extract()[num]
                num += 1
                if item['url']:
                    # self.logger.info('post ciclo %s', item['url'])

                    yield Request(url=item['url'], callback=self.parse_ad, meta={'item': item})
        except:
            break

def parse_ad(self, response):
    item = response.meta['item']
    single_ad = Selector(response)

    img_url = ''
    img_url_two = ''

    self.logger.info('RESPONSE URL PARSE = %s', response.url)

    nexta = response.xpath('//div[contains(@class, "anuRefBox")]/b').extract_first()
    nexta = comm.controls.eta(nexta)

link_frame ='http://www.xxxx.com/xxx-xxx/?id='+ nexta 尝试： item ['description'] = single_ad.xpath（self.DESCRIPTION_CSS）.extract_first（）如果item ['description']： item ['description'] = comm.controls.descrizione（item ['description']）除了IndexError： item ['description'] =''

    today = datetime.date.today()
    now = today.strftime('%Y-%m-%d')
    item['date'] = now

    if link_frame:
        # self.logger.info('post ciclo %s', item['url'])
        # sleep(2)
        yield Request(url=link_frame, callback=self.parse_frame_ad, meta={'item': item}, dont_filter=True)

def parse_frame_ad(self, response):
    if 'ConnectionWrongStateError' in response.body:
        raise scrapy.exceptions.CloseSpider('ConnectionWrongStateError')

    item = response.meta['item']
    single_frame_ad = Selector(response)
    self.logger.info('RESPONSE FRAME URL PARSE = %s', response.url)

......

    item['session_path'] = now
    # self.logger.info('images %s', images)
    self.logger.info('PATH %s', item['session_path'])

    if item:
        yield item
    else:
        raise DropItem("Missing item %s" % item)

def close(self, spider):
    today = datetime.datetime.now()
    now = today.strftime('%Y-%m-%d %H_%M_%S')
    self.logger.info('SCRAPY FINE %s', now)

现在，杂音不会停止。似乎正在等待某事

2019-09-12 10:07:59 [scrapy.extensions.logstats]信息：抓取了116页（以0页/分钟），抓取了29条（以1条/分钟） 2019-09-12 10:08:59 [scrapy.extensions.logstats]信息：抓取116页（以0页/分钟），刮取29件（以0件/分钟） 2019-09-12 10:09:59 [scrapy.extensions.logstats]信息：抓取116页（以0页/分钟的速度），抓取29项（以0项/分钟的速度） 2019-09-12 10:10:59 [scrapy.extensions.logstats]信息：抓取了116页（以0页/分钟的速度），抓取29项（以0项/分钟的速度） 2019-09-12 10:11:59 [scrapy.extensions.logstats]信息：抓取了116页（以0页/分钟的速度），抓取29项（以0条/分钟的速度）

刮擦不停止

0 个答案: