如何处理scrapy shell中的错误302

时间:2016-02-09 05:07:14

标签: scrapy

我'试图抓取一个页面我被重定向,我尝试设置一个用户代理,但它也没有工作。

我在其他问题中看到了这一点:

meta = {'dont_redirect': True,'handle_httpstatus_list': [302]}

我如何在scrapy shell中测试它?

1 个答案:

答案 0 :(得分:1)

使用scrapy shell时,最简单的方法可能就是在命令行中使用RedirectMiddleware设置禁用REDIRECT_ENABLED=0

比较一下,完全禁用重定向:

$ scrapy shell -s REDIRECT_ENABLED=0
2016-02-09 10:16:27 [scrapy] INFO: Scrapy 1.0.4 started (bot: scrapybot)
2016-02-09 10:16:27 [scrapy] INFO: Optional features available: ssl, http11
2016-02-09 10:16:27 [scrapy] INFO: Overridden settings: {'REDIRECT_ENABLED': '0', 'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-02-09 10:16:30 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState
2016-02-09 10:16:32 [scrapy] INFO: Enabled downloader middlewares:
HttpAuthMiddleware, 
DownloadTimeoutMiddleware, 
UserAgentMiddleware,
RetryMiddleware,
DefaultHeadersMiddleware,
MetaRefreshMiddleware,
HttpCompressionMiddleware,
CookiesMiddleware,
ChunkedTransferMiddleware,
DownloaderStats
2016-02-09 10:16:33 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-02-09 10:16:33 [scrapy] INFO: Enabled item pipelines: 
2016-02-09 10:16:33 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-02-09 10:16:39 [root] DEBUG: Using default logger

(您可以注意到RedirectMiddleware不在"已启用的下载中间件"列表中;

使用默认值:

$ scrapy shell
2016-02-09 10:17:18 [scrapy] INFO: Scrapy 1.0.4 started (bot: scrapybot)
2016-02-09 10:17:18 [scrapy] INFO: Optional features available: ssl, http11
2016-02-09 10:17:18 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-02-09 10:17:19 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState
2016-02-09 10:17:19 [scrapy] INFO: Enabled downloader middlewares:
HttpAuthMiddleware,
DownloadTimeoutMiddleware,
UserAgentMiddleware,
RetryMiddleware,
DefaultHeadersMiddleware,
MetaRefreshMiddleware,
HttpCompressionMiddleware,
RedirectMiddleware,
CookiesMiddleware,
ChunkedTransferMiddleware,
DownloaderStats
2016-02-09 10:17:19 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-02-09 10:17:19 [scrapy] INFO: Enabled item pipelines: 
2016-02-09 10:17:19 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-02-09 10:17:19 [root] DEBUG: Using default logger