在python中切换代理:Psuedo代码

时间:2017-11-06 23:22:59

标签: python switch-statement python-requests proxies

假设我有一个我想要抓的网站。防爆。 cheapoair.com

我想在python中使用普通请求来刮取第一个假设页面上的数据。如果我最终被服务器阻止,我想切换到代理。我有一个代理服务器列表和一个方法,我还有一个用户代理字符串列表。但是,我认为我需要帮助思考问题。

供参考 uagen()将返回用户代理字符串

proxit()将返回代理

这是我到目前为止所做的:

import requests
from proxy_def import *
from http import cookiejar
import time
from socket import error as SocketError
import sys

start_time = time.time()


class BlockAll(cookiejar.CookiePolicy):
    return_ok = set_ok = domain_return_ok = path_return_ok = lambda self, *args, **kwargs: False
    netscape = True
    rfc2965 = hide_cookie2 = False


headers = {'User-Agent': uagen()}

print(headers)

s = requests.Session()
s.cookies.set_policy(BlockAll)
cookies = {'SetCurrency': 'USD'}
sp = proxit()
for i in range(100000000000):
    while True:
        try:
            print('trying on ', sp)
            print('with user agent headers', headers)
            s.proxies = {"http": sp}
            r = s.get("http://www.cheapoair.com", headers=headers, timeout=15, cookies=cookies)
            print(i, sp, 'success')
            print("--- %s seconds ---" % (time.time() - start_time))
        except SocketError as e:
            print('passing ', sp)
            sp = proxit()
            headers = {'User-Agent': uagen()}
            print('this is the new proxy ', sp)
            print('this is the new headers ', headers)
            continue
        except requests.ConnectionError as e:
            print('passing ', sp)
            sp = proxit()
            headers = {'User-Agent': uagen()}
            print('this is the new proxy ', sp)
            print('this is the new headers ', headers)
            continue
        except requests.Timeout as e:
            print('passing ', sp)
            sp = proxit()
            headers = {'User-Agent': uagen()}
            print('this is the new proxy ', sp)
            print('this is the new headers ', headers)
            continue
        except KeyboardInterrupt:
            print("The program has been terminated")
            sys.exit(1)
        break

#print(r.text)
print('all done',
      '\n')

我正在寻找的是如何说,从正常请求(不是来自代理)开始,如果最终出现错误(例如被服务器拒绝),请切换到代理然后再试一次。

我几乎可以想象它,但是很难看到它。

我在想,如果我在

之后放置一个变量

for i in range(1000000000000):

但在while true:更新sp之前,它可能会有用。另一种可能是它可以声明s.proxies = {"http": ""}然后如果我遇到错误,请切换到s.poxies = {"http": "proxit()"}s.poxies = {"http": "sp"}

谢谢!

1 个答案:

答案 0 :(得分:1)

我明白了。

while True:
    try:
        #do this thing
        #but remove variable from here and declare it before "while True"
    except SockerError as e:
        #switch headers, switch user agent string
        s.proxies = {"http": proxit()}
        continue

这将在从服务器收到错误后刷新变量