Question

我无法弄清楚如何让scrapy刮掉301重定向页面。当我添加

handle_httpstatus_list = [301,302]

日志停止告诉我

2015-09-29 09:45:06 [scrapy] DEBUG: Crawled (301) <GET http://www.example.com/conditions-generales/> (referer: http://www.example.com/) 2015-09-29 09:45:07 [scrapy] DEBUG: Ignoring response <301 http://www.example.com/conditions-generales/>: HTTP status code is not handled or not allowed

但只抓取301重定向页面，并且永远不会从它们中删除数据（而对于200个http状态代码页则会这样做。）

然后我得到：

2015-09-29 09:55:39 [scrapy] DEBUG: Crawled (301) <GET http://www.example.com/espace-annonceurs/> (referer: http://www.example.com/)

但永远不会：

2015-09-29 09:55:39 [scrapy] DEBUG: Scraped from <301 http://www.example.com/espace-annonceurs/>

如果它是一个200 HTTP状态代码，我想按照我的方式抓取http://www.example.com/espace-annonceurs/ juste。

我想我必须使用中间件，但我不知道该怎么做

感谢您的帮助

Scrapy抓取301重定向页面但不从中抓取数据

0 个答案: