当我抓取抓取网站时,收到此错误消息
Ignoring non-200 response
但是当我在浏览器中调用该网站时,我获得了200 OK
我的代码如下:
[..]
yield scrapy.Request(url=url['name'], callback=self.parse, errback=self.errbacktest, meta={'websiteId': url['websiteId']})
def errbacktest(self, failure):
print(failure)
if failure.check(HttpError):
# these exceptions come from HttpError spider middleware
# you can get the non-200 response
response = failure.value.response
print('HttpError on %s', response)
elif failure.check(DNSLookupError):
# this is the original request
request = failure.request
print('DNSLookupError on %s', request.url)
elif failure.check(TimeoutError, TCPTimedOutError):
request = failure.request
print('TimeoutError on %s', request.url)
def parse(self, response):
print(response.status)
在这种情况下可能是什么问题?
答案 0 :(得分:0)
解决方案:
yield scrapy.Request(url=url['name'], callback=self.parse, errback=self.errbacktest, meta={'websiteId': url['websiteId']}, headers={('User-Agent', 'Mozilla/5.0')})
站点阻止了刮擦。添加标题即可解决问题