我在我的项目中使用代理轮换以防止被禁止访问网站,我必须将网址列表http://website/0001写入http://website/9999,当它检测到我正在抓取时他们会发送给我到网站/ contact.html。
我已经在设置中有我的代理列表
ROTATING_PROXY_LIST = [
'proxy1.com:8000',
'proxy2.com:8031',
# ...
]
我创造了这个蜘蛛:
next_page_url = response.url[17:]//getting the relative url from website/page
if next_page_url == "contact.html":
absolute_next_page = response.urljoin(last_page)
yield Request(absolute_next_page)
//should try the same page with different proxy
else:
next_page_url = int(next_page_url)+1
last_page = str(next_page_url).zfill(4)
absolute_next_page = response.urljoin(last_page)
yield Request(absolute_next_page)`
但它给出了一个错误,说UnboundLocalError:在赋值之前引用的局部变量'last_page'
如何在此蜘蛛中指定代理已死?还是有另一种方法可以做同样的事情吗?
答案 0 :(得分:0)
你想问什么?
你说你有错误
UnboundLocalError: local variable 'last_page' referenced before assignment
此错误表明您正在尝试使用未进行货币初始化的变量。
为防止出现此错误,请更改此代码
next_page_url = response.url[17:]//getting the relative url from website/page
next_page_url = int(next_page_url)+1
last_page = str(next_page_url).zfill(4)
absolute_next_page = response.urljoin(last_page)
if next_page_url == "contact.html":
next_page_url = int(next_page_url)+1
absolute_next_page = response.urljoin(last_page)
req = Request(url = absolute_next_page)
// If you want to try the same link again, then do this
// req = Request(url = response.url)
req.meta['proxy'] = random.choice(ROTATING_PROXY_LIST) // choose a random proxy
yield req
else:
yield Request(absolute_next_page)