Question

我创造了一只测试蜘蛛。这个蜘蛛获得一个具有url和xpath属性的对象。它会抓取url，然后相应地填充self.result字典。因此self.result可以是{'success':True,'httpresponse':200}或{'success':False,'httpresponse':404}等。

问题在于我不知道如何访问spider.result，因为没有对象蜘蛛。

..
    def test(self):
        from scrapy.crawler import CrawlerProcess
        ts = TestSpider

        process = CrawlerProcess({...})

        process.crawl(ts,[object,])
        process.start()
        print ts.result

我试过了：

   def test(self):
        from scrapy.crawler import CrawlerProcess
        ts = TestSpider(object)      
        process = CrawlerProcess({...})

        process.crawl(ts)
        process.start()
        print ts.result

但它说爬行需要2个参数。

你知道怎么做吗？我不想将结果保存到文件或数据库中。

Answer 1

这就是你如何致电crawl

process = CrawlerProcess(get_project_settings())
process.crawl(TestSpider() , arg1=val1, arg2=val2)

如何在爬网后访问蜘蛛属性

1 个答案: