我在python中有一些代码,代码是访问网站然后退出 现在我需要的是我想知道的是,以及如何以及添加代理列表的命令是什么。 我有一个包含IP和端口集的代理列表,我希望脚本逐个读取它们。
proxylist.txt示例:
111.68.103.39:3128
83.246.226.42:8080
196.20.65.211:8080
203.91.39.23:8080
110.34.39.58:8080
24.64.94.112:8080
190.192.125.141:8080
122.155.13.14:8001
200.54.92.187:3128
62.84.13.33:8080
200.80.30.155:3128
190.95.246.3:3128
62.97.116.178:443
所有这些代理都已添加到文件中,并命名为proxylist.txt。现在我想将它们添加到下面给出的脚本中:
#!/usr/bin/env python
#Disable some warnings
import logging
logging.getLogger("mechanize").setLevel(logging.ERROR)
import mechanize
# Setup some variables
url = 'http://www.google.com/'
proxy = "proxylist.txt"
ua = 'Mozilla/5.0 (X11; Linux i686 on x86_64; rv:8.0) Gecko/20100101 Firefox/8.0'
# Setup the browser instance
br = mechanize.Browser()
br.addheaders = [('User-agent', ua)]
br.set_handle_gzip(True)
br.set_handle_equiv(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Configure the proxy
proxylist = proxy.split(':')
proxytype = proxylist[0]
proxyserv = proxylist[1]
proxyport = proxylist[2]
proxyline = proxyserv + ':' + proxyport
proxydict = {proxytype: proxyline}
br.set_proxies(proxydict)
# Get the URL
print 'Retreiving the URL ' + url + '...'
# Get the returned HTML
html = response.read()
# Close the browser instance
br.close()
# Print the HTML
print html
文件必须像循环一样逐个遍历每个代理。
答案 0 :(得分:0)
proxydict= {} # create an empty dict
with open('proxylist.txt') as proxylist: # open the proxylist file
for line in proxylist: # iterate through all lines in the file
proxytype, proxyserv, proxyport = line.split(':') # extract proxy type, ip and port
proxydict[proxytype]= proxyserv + ':' + proxyport # add the proxy to the dict
所以完整的代码是
#!/usr/bin/env python
#Disable some warnings
import logging
logging.getLogger("mechanize").setLevel(logging.ERROR)
import mechanize
# Setup some variables
url = 'http://www.google.com/'
proxy = "proxylist.txt"
ua = 'Mozilla/5.0 (X11; Linux i686 on x86_64; rv:8.0) Gecko/20100101 Firefox/8.0'
# Setup the browser instance
br = mechanize.Browser()
br.addheaders = [('User-agent', ua)]
br.set_handle_gzip(True)
br.set_handle_equiv(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Configure the proxy
proxydict= {} # create an empty dict
with open('proxylist.txt') as proxylist: # open the proxylist file
for line in proxylist: # iterate through all lines in the file
proxytype, proxyserv, proxyport = line.split(':') # extract proxy type, ip and port
proxydict[proxytype]= proxyserv + ':' + proxyport # add the proxy to the dict
br.set_proxies(proxydict)
# Get the URL
print 'Retreiving the URL ' + url + '...'
# Get the returned HTML
html = response.read()
# Close the browser instance
br.close()
# Print the HTML
print html