现在我暂时有这个代码:
import json
import urllib2
with open('proxies.txt') as proxies:
for line in proxies:
proxy = json.loads(line)
proxy_handler = urllib2.ProxyHandler(proxy)
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
with open('urls.txt') as urls:
for line in urls:
url = line.rstrip()
data = urllib2.urlopen(url).read()
print data
我的proxies.txt文件如下:
{"https": "https://94.142.27.4:3128"}
{"http": "http://118.97.95.174:8080"}
{"http": "http://66.62.236.15:8080"}
和我的urls.txt文件如下:
http://www.google.com
http://www.facebook.com
http://www.reddit.com
似乎它正在安装所有代理,然后处理列表中的每个url并安装了所有代理。我真正想要的是,它可以通过每个代理单独访问每个URL。所以
有没有办法做到这一点?它已经这样做了吗?我误解了代理的真正含义吗?我误解了install_opener的真正含义吗?
答案 0 :(得分:3)
我不确定这正是你想要的,但是......
由于您希望通过所有代理尝试所有网址,因此您可以使用itertools.product
轻松构建所有组合:
import itertools
with open('proxies.txt') as proxies:
with open('urls.txt') as urls:
for (proxie, url) in itertools.product(proxies, urls):
print "access", url.rstrip(), "using", proxie.rstrip()
当然,您需要插入实际代码而不是print
。
也就是说,原始代码唯一的真正的问题可能就是缩进。您想要嵌套循环。所以你应该怎么写它:
with open('proxies.txt') as proxies:
for line in proxies:
proxy = json.loads(line)
proxy_handler = urllib2.ProxyHandler(proxy)
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
with open('urls.txt') as urls:
for line in urls:
url = line.rstrip()
data = urllib2.urlopen(url).read()
print data