使用SSL运行GET并在Python中进行身份验证

时间:2015-03-26 13:43:15

标签: python security authentication python-3.x ssl

我可以通过一种方式从受控服务器下载内容 - 将文档ID传递到如下链接:

https://website/deployLink/442/document/download/$NUMBER

如果我在浏览器中导航到此页面,则会下载ID为$NUMBER的文件。

问题是,我的服务器上有9,000个文件,这是SSL加密的,通常需要在网页上出现的对话框弹出窗口中使用用户名/密码登录。

我已经发布了类似的线程,我通过WGET下载了文件。现在我想尝试使用Python,我想提供用户名/密码并通过SSL加密。

这是我尝试抓取一个文件,导致401错误。完整的堆栈跟踪。

import urllib2
import ctypes
from HTMLParser import HTMLParser

# create a password manager
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()

# Add the username and password.
top_level_url = "https://website.com/home.html"
password_mgr.add_password(None, top_level_url, "admin", "password")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)

# create "opener" (OpenerDirector instance)
opener = urllib2.build_opener(handler)

# Install the opener.
# Now all calls to urllib2.urlopen use our opener.
urllib2.install_opener(opener)

# Grab website
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
html = response.read()

class MyHTMLParser(HTMLParser):

url=''https://website/deployLink/442/document/download/1')'


# Save the file
webpage = urllib2.urlopen(url)
with open('Test.doc','wb') as localFile:
     localFile.write(webpage.read())

我在这里做错了什么?我正在尝试的是什么?

C:\Python27\python.exe C:/Users/ADMIN/PycharmProjects/GetFile.py
Traceback (most recent call last):
  File "C:/Users/ADMIN/PycharmProjects/GetFile.py", line 22, in <module>
    response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
  File "C:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 437, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 401: Processed

使用退出代码1完成处理

这是我的真实页面,其中隐藏了一些信息:

Image

验证网址以:443结尾。

有人可以帮我调试一下并让它运行吗?

感谢。

1 个答案:

答案 0 :(得分:1)

假设您的上述代码是准确的,那么我认为您的问题与add_password方法中的URI有关。设置用户名/密码时有这个:

# Add the username and password.
top_level_url = "https://website.com/home.html"
password_mgr.add_password(None, top_level_url, "admin", "password")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)

然后您的后续请求转到此URI:

# Grab website
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')

(我假设他们已被“擦洗”错误,他们应该是相同的,然后继续前进。请参阅:“网站”与“website.com”)

第二个URI不是第一个URI的子节点,它们基于各自的路径部分。 URI路径/deployLink/442/document/download/1不是/home.html的子项。从库的角度来看,第二个URI有no auth data