Question

我在我的项目中使用代理轮换以防止被禁止访问网站，我必须将网址列表http://website/0001写入http://website/9999，当它检测到我正在抓取时他们会发送给我到网站/ contact.html。

我已经在设置中有我的代理列表
ROTATING_PROXY_LIST = [ 'proxy1.com:8000', 'proxy2.com:8031', # ... ]

我创造了这个蜘蛛：

    next_page_url = response.url[17:]//getting the relative url from website/page

    if next_page_url == "contact.html":

        absolute_next_page = response.urljoin(last_page)
        yield Request(absolute_next_page)
        //should try the same page with different proxy
    else:
        next_page_url = int(next_page_url)+1
        last_page = str(next_page_url).zfill(4)
        absolute_next_page = response.urljoin(last_page)
        yield Request(absolute_next_page)`

但它给出了一个错误，说UnboundLocalError：在赋值之前引用的局部变量'last_page'

如何在此蜘蛛中指定代理已死？还是有另一种方法可以做同样的事情吗？

Answer 1

你想问什么？

你说你有错误

UnboundLocalError: local variable 'last_page' referenced before assignment

此错误表明您正在尝试使用未进行货币初始化的变量。

为防止出现此错误，请更改此代码

next_page_url = response.url[17:]//getting the relative url from website/page

next_page_url = int(next_page_url)+1
last_page = str(next_page_url).zfill(4)
absolute_next_page = response.urljoin(last_page)

if next_page_url == "contact.html":

        next_page_url = int(next_page_url)+1
        absolute_next_page = response.urljoin(last_page)

        req = Request(url = absolute_next_page)

        // If you want to try the same link again, then do this
        // req = Request(url = response.url)

        req.meta['proxy'] = random.choice(ROTATING_PROXY_LIST) // choose a random proxy

        yield req

else:

        yield Request(absolute_next_page)

使用scrapy-rotating-proxies包

1 个答案: