文档说我只能在项目目录中执行crawl命令:
scrapy crawl tutor -o items.json -t json
但我真的需要在我的python代码中执行它(python文件不在当前项目目录中)
有什么方法符合我的要求吗?
我的项目树:
.
├── etao
│ ├── etao
│ │ ├── __init__.py
│ │ ├── items.py
│ │ ├── pipelines.py
│ │ ├── settings.py
│ │ └── spiders
│ │ ├── __init__.py
│ │ ├── etao_spider.py
│ ├── items.json
│ ├── scrapy.cfg
│ └── start.py
└── start.py <-------------- I want to execute the script here.
这里的任何代码都遵循link,但不起作用:
#!/usr/bin/env python
import os
#Must be at the top before other imports
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'project.settings')
from scrapy import project
from scrapy.conf import settings
from scrapy.crawler import CrawlerProcess
class CrawlerScript():
def __init__(self):
self.crawler = CrawlerProcess(settings)
if not hasattr(project, 'crawler'):
self.crawler.install()
self.crawler.configure()
def crawl(self, spider_name):
spider = self.crawler.spiders.create(spider_name) <--- line 19
if spider:
self.crawler.queue.append_spider(spider)
self.crawler.start()
self.crawler.stop()
# main
if __name__ == '__main__':
crawler = CrawlerScript()
crawler.crawl('etao')
错误是:
line 19: KeyError: 'Spider not found: etao'
答案 0 :(得分:3)
你实际上可以自己拨打crawlprocess
......
类似
from scrapy.crawler import CrawlerProcess
from scrapy.conf import settings
settings.overrides.update({}) # your settings
crawlerProcess = CrawlerProcess(settings)
crawlerProcess.install()
crawlerProcess.configure()
crawlerProcess.crawl(spider) # your spider here
致@warwaruk。