Question

因此，看来，https://packetstream.io/一直无处不在我的蜘蛛使用代理服务的地方。我联系了他们，他们说他们没有受到任何服务中断。我不断收到错误消息：

Retrying <GET https://www.oddschecker.com/us/boxing-mma> (failed 2 times): User timeout caused connection failure: Getting https://www.oddschecker.com/us/boxing-mma took longer than 180.0 seconds..

中间件：

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    'sfb.middlewares.SurefirebettingDownloaderMiddleware': 543,
    'sfb.middlewares.CustomProxyMiddleware': 350,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400,
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_useragents.downloadermiddlewares.useragents.UserAgentsMiddleware': 500
}

设置：

class CustomProxyMiddleware(object):
    def process_request(self, request, spider):
        request.meta["proxy"] = "https://proxy.packetstream.io:port"
        request.headers["Proxy-Authorization"] = basic_auth_header("username",
                                                                   "API key")

蜘蛛：

class OddscheckerSpider(scrapy.Spider):
    name = 'oddschecker'
    allowed_domains = []
    start_urls = ["https://www.oddschecker.com/us/boxing-mma"]
    def parse(self, response):
        soup = BeautifulSoup(response.text, "lxml")

这不像我的代理服务器刚刚被该网站禁止，因为现在使用代理服务时，我的所有蜘蛛都无法工作。但是，如果我注释掉代理设置和中间件，它就可以正常工作。有什么想法吗？

我的代理中间件停止工作，有什么想法吗？ -崎cra

0 个答案: