Python请求模块的代理设置

时间:2017-08-22 10:47:47

标签: python proxy python-requests

我正在尝试从网址中读取html。我尝试了以下方法:

import requests
f = requests.get('http://www.google.com')
print f.text

返回了以下Traceback:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x03142310>: Failed to establish a new connection: [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond',))

所以,我假设我的工作(大学)有一个代理。我使用http://www.whatismyproxy.com/获取外部IP,猜测端口是80,并生成以下代码(IP已更改):

import requests
f = requests.get(link, 
                 proxies={"http": "http://123.45.678.910:80"})
print f.text

这样做了,但它返回的html不是Google的(如果我将网址更改为Twitter,则相同):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
 <head>
  <title>Index of /</title>
 </head>
 <body>
<h1>Index of /</h1>
  <table>
   <tr><th valign="top"><img src="/icons/blank.gif" alt="[ICO]"></th><th><a href="?C=N;O=D">Name</a></th><th><a href="?C=M;O=A">Last modified</a></th><th><a href="?C=S;O=A">Size</a></th><th><a href="?C=D;O=A">Description</a></th></tr>
   <tr><th colspan="5"><hr></th></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[   ]"></td><td><a href="direct.dat">direct.dat</a></td><td align="right">2013-10-24 18:09  </td><td align="right"> 73 </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/folder.gif" alt="[DIR]"></td><td><a href="errors/">errors/</a></td><td align="right">2015-01-13 16:15  </td><td align="right">  - </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[   ]"></td><td><a href="filtered.dat">filtered.dat</a></td><td align="right">2015-02-06 13:39  </td><td align="right">3.0K</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/folder.gif" alt="[DIR]"></td><td><a href="html/">html/</a></td><td align="right">2016-09-30 07:50  </td><td align="right">  - </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[   ]"></td><td><a href="wpad.dat">wpad.dat</a></td><td align="right">2016-03-30 05:16  </td><td align="right">2.5K</td><td>&nbsp;</td></tr>
   <tr><th colspan="5"><hr></th></tr>
</table>
<address>Apache/2.4.10 (Debian) Server at www.google.com Port 80</address>
</body></html>

这是我可以解决的问题,还是与我的工作设置有关(我如何确认)?

1 个答案:

答案 0 :(得分:0)

我需要的代理设置,无法从其他网站查看。我是从wpad.dat文件中获取的,我在wpad.myuniversityname.ac找到了该文件。 第二个有用的注释是,您可能需要扩展代理设置字典以包括http和https设置:

proxies={"http": "http://123.45.678.910:80", "https": "http://123.45.678.910:80"}