这是我的规则,这是我第一次使用crawlspider,所以如何在我的规则中停止重定向(302)
rules = (
Rule(LinkExtractor(allow=r'zhaopin/.*'), follow=True),
Rule(LinkExtractor(allow=r'gongsi/j.*/.html'), follow=True),
Rule(LinkExtractor(allow=r'jobs/.*.html'), callback='parse_job', follow=True),
)
这是调试,你可以看到,
2017-07-05 09:20:24 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/CTO/>
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/jiagoushi/>
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/C%23/>
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/youxizhizuoren/>
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/chanpinbujingli/>
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/wuxianchanpinshejishi/>
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/wangyechanpinshejishi/>
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/chanpinshixisheng/>
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/dbaqita/>
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/guanggaoshejishi/>
2017-07-05 09:20:26 [scrapy.crawler] INFO: Received SIG_UNBLOCK, shutting down gracefully. Send again to force
答案 0 :(得分:0)
在设置中添加Cookie和User-Agent,就像
一样DEFAULT_REQUEST_HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36',
'Cookie': 'user_trace_token=201708...',
'Referer': 'https://www.lagou.com'
}