scrapy shell ""https://www.winemag.com/wine-ratings/2/"
response
但是我得到
2019-02-19 14:16:35 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-02-19 14:16:35 [scrapy.core.engine] INFO: Spider opened
2019-02-19 14:16:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.winemag.com/robots.txt> (referer: None)
2019-02-19 14:16:35 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET http://www.winemag.com/wine-ratings> from <GET https://www.winemag.com/wine-ratings/2/>
2019-02-19 14:16:35 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.winemag.com/wine-ratings> from <GET http://www.winemag.com/wine-ratings>
2019-02-19 14:16:35 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.winemag.com/wine-ratings/> from <GET https://www.winemag.com/wine-ratings>
2019-02-19 14:16:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.winemag.com/wine-ratings/> (referer: None)
<200 https://www.winemag.com/wine-ratings/>
我不知道为什么无法获得完整链接,请有人给我一个建议。
答案 0 :(得分:1)
似乎winemag
将搜寻器重定向到其主页:
⇾ curl -I 'https://www.winemag.com/wine-ratings/2/'
HTTP/2 301
[...]
location: http://www.winemag.com/wine-ratings
[...]
似乎是scrapy
的预期行为,它是您访问的网站返回的重定向之后的结果?
答案 1 :(得分:0)
我找到了答案。我必须在设置文件中指定USER_AGENT。