Question

我在python中有一些代码，代码是访问网站然后退出现在我需要的是我想知道的是，以及如何以及添加代理列表的命令是什么。我有一个包含IP和端口集的代理列表，我希望脚本逐个读取它们。

proxylist.txt示例：

111.68.103.39:3128
83.246.226.42:8080
196.20.65.211:8080
203.91.39.23:8080
110.34.39.58:8080
24.64.94.112:8080
190.192.125.141:8080
122.155.13.14:8001
200.54.92.187:3128
62.84.13.33:8080
200.80.30.155:3128
190.95.246.3:3128
62.97.116.178:443

所有这些代理都已添加到文件中，并命名为proxylist.txt。现在我想将它们添加到下面给出的脚本中：

#!/usr/bin/env python

#Disable some warnings
import logging
logging.getLogger("mechanize").setLevel(logging.ERROR)
import mechanize

# Setup some variables
url = 'http://www.google.com/'
proxy = "proxylist.txt"
ua = 'Mozilla/5.0 (X11; Linux i686 on x86_64; rv:8.0) Gecko/20100101 Firefox/8.0'

# Setup the browser instance
br = mechanize.Browser()
br.addheaders = [('User-agent', ua)]
br.set_handle_gzip(True)
br.set_handle_equiv(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Configure the proxy

proxylist = proxy.split(':')
proxytype = proxylist[0]
proxyserv = proxylist[1]
proxyport = proxylist[2]
proxyline = proxyserv + ':' + proxyport
proxydict = {proxytype: proxyline}
br.set_proxies(proxydict)

# Get the URL
print 'Retreiving the URL ' + url + '...'

# Get the returned HTML
html = response.read()

# Close the browser instance
br.close()

# Print the HTML
print html

文件必须像循环一样逐个遍历每个代理。

Answer 1

proxydict= {} # create an empty dict
with open('proxylist.txt') as proxylist: # open the proxylist file
    for line in proxylist: # iterate through all lines in the file
        proxytype, proxyserv, proxyport = line.split(':') # extract proxy type, ip and port
        proxydict[proxytype]= proxyserv + ':' + proxyport # add the proxy to the dict

所以完整的代码是

#!/usr/bin/env python

#Disable some warnings
import logging
logging.getLogger("mechanize").setLevel(logging.ERROR)
import mechanize

# Setup some variables
url = 'http://www.google.com/'
proxy = "proxylist.txt"
ua = 'Mozilla/5.0 (X11; Linux i686 on x86_64; rv:8.0) Gecko/20100101 Firefox/8.0'

# Setup the browser instance
br = mechanize.Browser()
br.addheaders = [('User-agent', ua)]
br.set_handle_gzip(True)
br.set_handle_equiv(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Configure the proxy

proxydict= {} # create an empty dict
with open('proxylist.txt') as proxylist: # open the proxylist file
    for line in proxylist: # iterate through all lines in the file
        proxytype, proxyserv, proxyport = line.split(':') # extract proxy type, ip and port
        proxydict[proxytype]= proxyserv + ':' + proxyport # add the proxy to the dict
br.set_proxies(proxydict)

# Get the URL
print 'Retreiving the URL ' + url + '...'

# Get the returned HTML
html = response.read()

# Close the browser instance
br.close()

# Print the HTML
print html

如何在此命令中添加代理列表

1 个答案: