现在这是脚本:
import json
import urllib2
with open('urls.txt') as f:
urls = [line.rstrip() for line in f]
with open('proxies.txt') as proxies:
for line in proxies:
proxy = json.loads(line)
proxy_handler = urllib2.ProxyHandler(proxy)
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
for url in urls:
data = urllib2.urlopen(url).read()
print data
这是urls.txt文件:
http://myipaddress.com
和proxies.txt文件:
{"https": "https://87.98.216.22:3128"}
{"https": "http://190.153.7.189:8080"}
{"https": "http://125.39.68.181:80"}
我一直试图通过终端输出(一堆html)测试它,并查看它是否显示某处的IP地址,并希望它是代理IP的一个。但这似乎不起作用。根据ip识别站点,它会抛出连接错误或告诉我必须输入验证字母(尽管通过浏览器查看的网站工作正常)。
我是以最好的方式来做这件事的吗?有没有更简单的方法来检查网址看到的IP地址?
编辑:我在其他地方(在另一个论坛上)听到一种方法来检查是否从不同的IP访问URL是检查交叉标头(如html标头表示它被重定向)。但我找不到更多的信息。
答案 0 :(得分:2)
您可以使用更简单的网站this。例如:
<强>代码:强>
import json
import urllib2
with open('urls.txt') as f:
urls = [line.rstrip() for line in f]
with open('proxies.txt') as proxies:
for line in proxies:
proxy = json.loads(line)
proxy_handler = urllib2.ProxyHandler(proxy)
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
for url in urls:
try:
data = urllib2.urlopen(url).read()
print proxy, "-", data
except:
print proxy, "- not working"
<强> urls.txt:强>
http://api.exip.org/?call=ip
<强> proxies.txt:强>
{"http": "http://218.108.114.140:8080"}
{"http": "http://59.47.43.93:8080"}
{"http": "http://218.108.170.172:80"}
<强>输出:强>
{u'http': u'http://218.108.114.140:8080'} - 218.108.114.140
{u'http': u'http://59.47.43.93:8080'} - 118.207.240.161
{u'http': u'http://218.108.170.172:80'} - not working
[Finished in 25.4s]
注意:这些都不是我真正的IP。
或者如果您想使用http://myipaddress.com,您可以使用BeautifulSoup,通过提取包含IP的确切HTML元素