我想使用mechanize登录页面并检索一些信息。但是,我尝试验证它只是失败,错误代码 HTTP 401 ,如下所示:
r = br.open('http://intra')
File "bui...e\_mechanize.py", line 203, in open
File "bui...g\mechanize\_mechanize.py", line 255,
in _mech_openmechanize._response.httperror_seek_wrapper: HTTP Error 401: Unauthorized
到目前为止,这是我的代码:
import mechanize
import cookielib
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
# br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# If the protected site didn't receive the authentication data you would
# end up with a 410 error in your face
br.add_password('http://intra', 'myusername', 'mypassword')
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# Open some site, let's pick a random one, the first that pops in mind:
# r = br.open('http://google.com')
r = br.open('http://intra')
html = r.read()
# Show the source
print html
我做错了什么?访问http://intra
(内部网页),例如chrome,它会弹出一个窗口并要求输入用户名/密码,然后一切都很好。
弹出的对话框如下所示:
答案 0 :(得分:2)
经过大量研究后,我设法找出了背后的原因。
查找所有网站使用所谓的NTLM authentication,机械化不支持。 这有助于找出网站的身份验证机制:
wget -O /dev/null -S http://www.the-site.com/
所以代码被修改了一点:
import sys
import urllib2
import mechanize
from ntlm import HTTPNtlmAuthHandler
print("LOGIN...")
user = sys.argv[1]
password = sys.argv[2]
url = sys.argv[3]
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)
browser = mechanize.Browser()
handlersToKeep = []
for handler in browser.handlers:
if not isinstance(handler,
(mechanize._http.HTTPRobotRulesProcessor)):
handlersToKeep.append(handler)
browser.handlers = handlersToKeep
browser.add_handler(auth_NTLM)
response = browser.open(url)
response = browser.open("http://www.the-site.com")
print(response.read())
最后,机械化需要修补,如上所述[{3}}:
--- _response.py.old 2013-02-06 11:14:33.208385467 +0100
+++ _response.py 2013-02-06 11:21:41.884081708 +0100
@@ -350,8 +350,13 @@
self.fileno = self.fp.fileno
else:
self.fileno = lambda: None
- self.__iter__ = self.fp.__iter__
- self.next = self.fp.next
+
+ if hasattr(self.fp, "__iter__"):
+ self.__iter__ = self.fp.__iter__
+ self.next = self.fp.next
+ else:
+ self.__iter__ = lambda self: self
+ self.next = lambda self: self.fp.readline()
def __repr__(self):
return '<%s at %s whose fp = %r>' % (
答案 1 :(得分:0)
@theAlse:你需要单独处理会话cookie吗?我使用您的方法对SSO服务器进行身份验证,但是当我在第二次“browser.open”调用时访问主站点(ServiceNow)时,我仍然收到401:未经授权的错误。
我在mechanize _response.py文件上添加了一条调试消息,以显示正在访问的URL,我很惊讶有一个辅助SSO服务器。
$ python s3.py
LOGIN...
[_DEBUG] Visiting https://sso.intra.client.com
[_DEBUG] Got past the first open statement.
[_DEBUG] Visiting https://clienteleitsm.service-now.com
[_DEBUG] Visiting <Request for https://ssointra.web.ipc.us.client.com/ssofedi/public/saml2sso?SAMLRequest=lVLB--snipped--&RelayState=https%3a%2f%2fclienteleitsm.service-now.com%2fnavpage.do>
[_DEBUG] Visiting <Request for https://ssointra.web.ipc.us.client.com/ssofedi/redirectjsp/FederationRedirectWDA.jsp?SAMLRequest=lVLBb--snipped--&SMPORTALURL=https%3A%2F%2Fssointra.web.ipc.us.client.com%2Fssofedi%2Fpublic%2Fsaml2sso>
[_DEBUG] Visiting <Request for https://ssointra.web.ipc.us.client.com/SSOI/ntlm/RedirectToWDA.jsp?TYPE=33554433&REALMOID=--snipped--%3D%26RelayState%3dhttps$%3a$%2f$%2fclienteleitsm%2eservice-now%2ecom$%2fnavpage%2edo%26SMPORTALURL%3dhttps$%3A$%2F$%2Fssointra%2eweb%2eipc%2eus%2eclient%2ecom$%2Fssofedi$%2Fpublic$%2Fsaml2sso>
[_DEBUG] Visiting <Request for https://ssointra.web.ipc.us.client.com/SSOI/ntlm/WDAProtectedPage.jsp?Target=HTTPS://ssointra.--snipped--&RelayState=https%3A%2F%2Fclienteleitsm.service-now.com%2Fnavpage.do&SMPORTALURL=https%3A%2F%2Fssointra.web.ipc.us.client.com%2Fssofedi%2Fpublic%2Fsaml2sso>
[_DEBUG] Visiting <Request for https://sso.intra.client.com/siteminderagent/ntlm/creds.ntc?CHALLENGE=&SMAGENTNAME=--snipped--https$%3A$%2F$%2Fssointra%2eweb%2eipc%2eus%2eclient%2ecom$%2Fssofedi$%2Fpublic$%2Fsaml2sso>
[Client-specific page about invalid username and password credential combination follows]
<HTML>
...
</HTML>
我已经在第三个调试行之后剪切了很多重定向URL。随机字符串实际上是唯一的,因为当我将它们放入浏览器时,我得到一个错误页面。但是如果我在IE浏览器中这样做,我甚至看不到重定向页面。
感谢。