scrapy保持重定向(元刷新)

时间:2019-08-21 07:16:21

标签: python scrapy

我正在尝试抓取一个网站。但是,主机会继续重定向蜘蛛,直到它到达max redirections reached。日志如下:

2019-08-21 17:10:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://zjip.patsev.com/pldb-zj/> from <GET http://zjip.patsev.com/>             
2019-08-21 17:10:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (meta refresh) to <GET http://zjip.patsev.com/pldb-zj/access/toLogin> from <GET http://zjip.pa
tsev.com/pldb-zj/>                                                                                                                                                          
2019-08-21 17:10:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://open.cnipr.com/oauth/authorize?client_id=8A3C47AC471F1D588A0F84B93E540C06
&response_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin> from <GET http://zjip.patsev.com/pldb-zj/access/toLogin>                                 
2019-08-21 17:10:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://zjip.patsev.com/pldb-zj/?client_id=8A3C47AC471F1D588A0F84B93E540C06&respo
nse_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin> from <GET http://open.cnipr.com/oauth/authorize?client_id=8A3C47AC471F1D588A0F84B93E540C06&respo
nse_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin>
2019-08-21 17:10:58 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (meta refresh) to <GET http://zjip.patsev.com/pldb-zj/access/toLogin> from <GET http://zjip.pa
tsev.com/pldb-zj/?client_id=8A3C47AC471F1D588A0F84B93E540C06&response_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin>                              
2019-08-21 17:10:58 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://open.cnipr.com/oauth/authorize?client_id=8A3C47AC471F1D588A0F84B93E540C06
&response_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin> from <GET http://zjip.patsev.com/pldb-zj/access/toLogin>                                 
2019-08-21 17:10:58 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://zjip.patsev.com/pldb-zj/?client_id=8A3C47AC471F1D588A0F84B93E540C06&respo
nse_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin> from <GET http://open.cnipr.com/oauth/authorize?client_id=8A3C47AC471F1D588A0F84B93E540C06&respo
nse_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin>
2019-08-21 17:10:59 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (meta refresh) to <GET http://zjip.patsev.com/pldb-zj/access/toLogin> from <GET http://zjip.pa
tsev.com/pldb-zj/?client_id=8A3C47AC471F1D588A0F84B93E540C06&response_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin>                              
2019-08-21 17:10:59 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://open.cnipr.com/oauth/authorize?client_id=8A3C47AC471F1D588A0F84B93E540C06
&response_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin> from <GET http://zjip.patsev.com/pldb-zj/access/toLogin>                                 
2019-08-21 17:11:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://zjip.patsev.com/pldb-zj/?client_id=8A3C47AC471F1D588A0F84B93E540C06&respo
nse_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin> from <GET http://open.cnipr.com/oauth/authorize?client_id=8A3C47AC471F1D588A0F84B93E540C06&respo
nse_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin>

重定向直到

似乎都是必要的
2019-08-21 17:10:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://zjip.patsev.com/pldb-zj/?client_id=8A3C47AC471F1D588A0F84B93E540C06&respo
nse_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin> from <GET http://open.cnipr.com/oauth/authorize?client_id=8A3C47AC471F1D588A0F84B93E540C06&respo
nse_type=code&redirect_uri=http://zjip.patsev.com/pldb-zj/access/oauthLogin>

这一点,但是之后它仍然不断刷新。

您知道如何查看重定向的响应以及如何在适当的位置停止重定向吗?非常感谢!


更新:我检查浏览器中的URL为http://zjip.patsev.com/。如果我使用requests,就不会有同样的问题

res = requests.get('http://zjip.patsev.com/', proxies=proxy_dict, headers=headers)

0 个答案:

没有答案