使用urllib模块的Python程序

时间:2015-08-13 06:17:38

标签: python

民间

以下程序用于查找页面http://whatismyipaddress.com/

中给出的IP地址
import urllib2
import re

response = urllib2.urlopen('http://whatismyipaddress.com/')

p = response.readlines()
for line in p:
    ip = re.findall(r'(\d+.\d+.\d+.\d+)',line)
    print ip

但我无法解决问题,因为它提供了以下错误

Traceback (most recent call last):
  File "Test.py", line 5, in <module>
  response = urllib2.urlopen('http://whatismyipaddress.com/')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
  return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open
  response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response
  'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 475, in error
  return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
  result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 558, in http_error_default
  raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

urllib2.HTTPError:HTTP错误403:禁止

任何人都知道要删除错误并获得所需输出需要进行哪些更改?

3 个答案:

答案 0 :(得分:3)

http错误代码403告诉您服务器由于某种原因不想回复您的请求。在这种情况下,我认为它是您查询的用户代理(urllib2使用的默认值)。

您可以更改用户代理:

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open('http://www.whatismyipaddress.com/')

然后您的查询将起作用。

但无法保证这将继续有效。该网站可能决定阻止自动查询。

答案 1 :(得分:0)

试试这个

>>> import urllib2
>>> import re
>>> site= 'http://whatismyipaddress.com/'
>>> hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
...        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
...        'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
...        'Accept-Encoding': 'none',
...        'Accept-Language': 'en-US,en;q=0.8',
...        'Connection': 'keep-alive'}
>>> req = urllib2.Request(site, headers=hdr)
>>> response = urllib2.urlopen(req)
>>> p = response.readlines()
>>> for line in p:
...     ip = re.findall(r'(\d+.\d+.\d+.\d+)',line)
...     print ip

urllib2-httperror-http-error-403-forbidden

答案 2 :(得分:0)

您可以尝试requestshere,而不是urllib2

它更容易使用:

import requests
url='http://whereismyip.com'
header = {'user-Agent':'curl/7.21.3'}
r= requests.get(url,header)

您可以使用curl作为用户代理