CrawlerProcess /不带任何物品/ Scrapy

时间:2018-11-02 14:37:31

标签: python scrapy web-crawler

我已经基于简单的文档进行了非常简单的尝试,以使CrawlerProcess从单个文件运行蜘蛛。这是代码:

import scrapy
from scrapy.crawler import CrawlerProcess

class BaseSpider(scrapy.Spider):
def common_parse(self, response):
    yield {
        'test': response.css("title::text").extract()
           }


class MonoprixSpider(BaseSpider):
# Your first spider definition
name = "monoprix_bot"
start_url = ['https://www.monoprix.fr/courses-en-ligne']

def parse(self, response):
    self.common_parse(response)

class EbaySpider(BaseSpider):
# Your second spider definition
name = "ebay_bot"
start_url = ['https://www.ebay.fr/']

def parse(self, response):
    self.common_parse(response)

process = CrawlerProcess()
process.crawl(MonoprixSpider)
process.crawl(EbaySpider)
process.start() # the script will block here until all crawling jobs are finished

这两个蜘蛛打开和关闭时都不会产生页面标题(作为测试)。我以前将更复杂的Ebay和Monoprix蜘蛛放入了两个不同的项目,并且效果很好...

我缺少明显的东西吗?

1 个答案:

答案 0 :(得分:0)

请将 start_url 更改为 start_urls

start_urls = ['https://www.monoprix.fr/courses-en-ligne']

由于没有start_urls,因此基本上您是在将Spider播种为空。