使用python和python NTLM浏览受NTLM保护的网站

时间:2011-06-21 14:34:13

标签: python ntlm

我的任务是创建一个脚本,登录到公司门户网站转到特定页面,下载页面,将其与早期版本进行比较,然后根据已经进行的更改通过电子邮件发送给某个人。最后的部分很容易,但它是第一步给我带来最多的麻烦。

使用urllib2失败后(我试图在python中这样做)连接和大约4或5个小时的谷歌搜索我已经确定我无法连接的原因是由于网页上的NTLM身份验证。我尝试过在本网站和其他网站上找到的一系列不同的连接过程无济于事。根据我所做的NTLM example

import urllib2
from ntlm import HTTPNtlmAuthHandler

user = 'username'
password = "password"
url = "https://portal.whatever.com/"

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)

# create and install the opener
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)

# create a header
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
header = { 'Connection' : 'Keep-alive', 'User-Agent' : user_agent}

response = urllib2.urlopen(urllib2.Request(url, None, header))

当我运行它(使用真实的用户名,密码和URL)时,我得到以下内容:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "ntlm2.py", line 21, in <module>
    response = urllib2.urlopen(urllib2.Request(url, None, header))
  File "C:\Python27\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 438, in error
     return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
     result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
  urllib2.HTTPError: HTTP Error 401: Unauthorized

对我来说最有趣的事情是,最后一行表示发回401错误。根据我的read,401错误是NTLM启动时发送回客户端的第一条消息。我的印象是python-ntml的目的是为我处理NTLM进程。这是错的还是我只是错误地使用它?此外,我没有限制使用python,所以如果有一种更简单的方法用另一种语言做到这一点让我知道(从我看到的谷歌搜索没有)。 谢谢!

1 个答案:

答案 0 :(得分:1)

如果站点正在使用NTLM身份验证,则生成的HTTPError的headers属性应该这样说:

>>> try:
...   handle = urllib2.urlopen(req)
... except IOError, e:
...   print e.headers
... 
<other headers>
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM