我正在尝试从redis(rq)中检索一个函数,它生成一个CrawlerProcess但我得到了
工作马过程意外终止(waitpid返回11)
控制台日志:
将工作转移到'失败'队列(工作马意外终止; waitpid返回11)
在我用评论
标记的行上此行杀死程序
我做错了什么? 我怎么解决它?
这个函数我从RQ中检索得很好:
def custom_executor(url):
process = CrawlerProcess({
'USER_AGENT': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36",
'DOWNLOAD_TIMEOUT': 20000, # 100
'ROBOTSTXT_OBEY': False,
'HTTPCACHE_ENABLED': False,
'REDIRECT_ENABLED': False,
'SPLASH_URL': 'http://localhost:8050/',
'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage',
'DOWNLOADER_MIDDLEWARES': {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
},
'SPIDER_MIDDLEWARES': {
'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': True,
'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': True,
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': True,
'scrapy.extensions.closespider.CloseSpider': True,
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
})
### THIS LINE KILL THE PROGRAM
process.crawl(ExtractorSpider,
start_urls=[url, ], es_client=es_get_connection(),
redis_conn=redis_get_connection())
process.start()
这是我的ExtractorSpider:
class ExtractorSpider(Spider):
name = "Extractor Spider"
handle_httpstatus_list = [301, 302, 303]
def parse(self, response):
yield SplashRequest(url=url, callback=process_screenshot,
endpoint='execute', args=SPLASH_ARGS)
谢谢
答案 0 :(得分:2)
由于计算量大而没有足够的内存,该过程崩溃了。增加内存修复了这个问题。
答案 1 :(得分:0)
对我来说,该过程正在超时,必须更改默认超时时间