Question

我想使用scrapy shell并测试需要基本身份验证凭据的网址的响应数据。我试图检查scrapy shell文档，但我无法在那里找到它。

我尝试使用scrapy shell 'http://user:pwd@abc.com'，但它没有用。有谁知道我怎么能实现它？

Answer 1

如果你只想使用shell，你可以这样做：

$ scrapy shell

并在shell中：

>> from w3lib.http import basic_auth_header
>> from scrapy import Request
>> auth = basic_auth_header(your_user, your_password)
>> req = Request(url="http://example.com", headers={'Authorization': auth})
>> fetch(req)

因为fetch使用当前请求来更新shell会话。

Answer 2

是httpauth middleware。

确保在设置中启用了HTTPAuthMiddleware，然后定义：

class MySpider(CrawSpider):
    http_user = 'username'
    http_pass = 'password'
    ...

作为蜘蛛中的类变量。

此外，如果在设置中启用了中间件，则无需在网址中指定登录凭据。

如何使用url和基本身份验证凭证scrapy shell？

2 个答案: