我的代理中间件停止工作,有什么想法吗? -崎cra

时间:2020-08-05 16:29:04

标签: python proxy scrapy

因此,看来,https://packetstream.io/一直无处不在我的蜘蛛使用代理服务的地方。我联系了他们,他们说他们没有受到任何服务中断。我不断收到错误消息:

Retrying <GET https://www.oddschecker.com/us/boxing-mma> (failed 2 times): User timeout caused connection failure: Getting https://www.oddschecker.com/us/boxing-mma took longer than 180.0 seconds..

中间件:

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    'sfb.middlewares.SurefirebettingDownloaderMiddleware': 543,
    'sfb.middlewares.CustomProxyMiddleware': 350,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400,
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_useragents.downloadermiddlewares.useragents.UserAgentsMiddleware': 500
}

设置:

class CustomProxyMiddleware(object):
    def process_request(self, request, spider):
        request.meta["proxy"] = "https://proxy.packetstream.io:port"
        request.headers["Proxy-Authorization"] = basic_auth_header("username",
                                                                   "API key")

蜘蛛:

class OddscheckerSpider(scrapy.Spider):
    name = 'oddschecker'
    allowed_domains = []
    start_urls = ["https://www.oddschecker.com/us/boxing-mma"]
    def parse(self, response):
        soup = BeautifulSoup(response.text, "lxml")

这不像我的代理服务器刚刚被该网站禁止,因为现在使用代理服务时,我的所有蜘蛛都无法工作。但是,如果我注释掉代理设置和中间件,它就可以正常工作。有什么想法吗?

0 个答案:

没有答案