网站是https://www.extratodebito.detran.pr.gov.br/detranextratos/geraExtrato.do?action=iniciarProcesso
yield Request(self.url, callback=self.login_me, dont_filter=True)
返回<html><head><title>Error</title></head><body>Unauthorized</body></html>
但是如果我确实使用了请求库,那就很好了!
它为什么会发生?
更新:
普通标题看起来像
Host: www.extratodebito.detran.pr.gov.br
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
我将其添加到scrapy,但是我可以看到在请求期间添加的授权字段
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Authorization: Basic MTM2ZGNjNmFhOWZmNDA1Njk1YWU1MWE0ZjI1MzZlYzE6
Host: www.extratodebito.detran.pr.gov.br
更新2:
通过在蜘蛛中删除用于飞溅的http_user和http_pass来解决,但也使用scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware发送给通常的请求
答案 0 :(得分:0)
当添加Accept,Accept-Language和Accept-Encoding标头时,对我来说工作正常。
我在scrapy shell
中进行了测试:
headers = {'Accept': ['text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'], b'Accept-Language': ['en'], 'Accept-Encoding': ['gzip,deflate,br']}
url = "https://www.extratodebito.detran.pr.gov.br/detranextratos/geraExtrato.do?action=iniciarProcesso"
from scrapy import Request
req = Request(url, headers=headers)
fetch(req)
我收到200条回复:
2020-09-14 11:16:03 [scrapy.core.engine] INFO: Spider opened
2020-09-14 11:16:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.extratodebito.detran.pr.gov.br/detranextratos/geraExtrato.do?action=iniciarProcesso> (referer: None)