通过官方tut工作后,我决定尝试在同一个项目中构建自己的蜘蛛。我在蜘蛛中创建了parker_spider.py,其中包含:
start_urls = [
"myurl"
]
class Parker_Spider(scrapy.Spider):
name = "parker"
def make_requests(self):
for i in range(self.max_id):
yield Request('myurl', method="post", headers= headers, body=payload, callback=self.parse_method)
def parse_method(self,response):
print(response.body)
当我跑步时:
$ scrapy runspider parker
2016-05-25 20:26:42 [scrapy] INFO: Scrapy 1.1.0 started (bot: tutorial)
2016-05-25 20:26:42 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tutorial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'tutoria
l'}
Usage
=====
scrapy runspider [options] <spider_file>
runspider: error: File not found: parker
我做错了什么?
答案 0 :(得分:9)
runspider
command需要蜘蛛文件名,而不是蜘蛛名称:
$ scrapy runspider parker_spider.py
并且,如果您已经创建了一个Scrapy项目并且正在项目目录中运行蜘蛛,那么最好使用crawl
command - 这里您应该使用蜘蛛名称:
$ scrapy crawl parker