Question

我正在尝试通过python登录我大学的服务器，但我完全不确定如何生成适当的HTTP POST，创建密钥和证书，以及我可能不熟悉的过程的其他部分必须遵守SAML规范。我可以用我的浏览器登录就好了，但我希望能够使用python登录和访问服务器中的其他内容。

供参考，here is the site

我尝试使用mechanize登录（选择表单，填充字段，通过mechanize.Broswer.submit（）点击提交按钮控件等）无济于事;登录站点每次都会回来。

此时，我愿意以最适合该任务的语言实施解决方案。基本上，我想以编程方式登录SAML经过身份验证的服务器。

Answer 1

基本上您必须了解的是SAML身份验证过程背后的工作流程。不幸的是，没有PDF可以帮助我们找到浏览器在访问受SAML保护的网站时所做的事情。

也许你应该看看这样的事情：http://www.docstoc.com/docs/33849977/Workflow-to-Use-Shibboleth-Authentication-to-Sign 显然对此：http://en.wikipedia.org/wiki/Security_Assertion_Markup_Language。特别要注意这个方案：

enter image description here

当我尝试理解SAML的工作方式时，我做了什么，因为文档如此差，正在写下（是的！写在纸上）浏览器所做的所有步骤第一个到最后一个。我使用Opera，设置它以便不允许自动重定向（300,301,302响应代码等），也不启用Javascript。然后我写下了服务器发给我的所有cookie，做了什么，以及出于什么原因。

也许这太费劲了，但通过这种方式，我能够用Java编写一个适合这项工作的库，并且非常快速有效。也许有一天我会把它公之于众......

您应该了解的是，在SAML登录中，有两个演员在玩：IDP（身份提供商）和SP（服务提供商）。

一种。第一步：用户代理向SP

请求资源

我很确定您从另一个页面点击“访问受保护的网站”之类的内容到达了您在问题中引用的链接。如果您更加注意，您会注意到您所关注的链接不是显示身份验证表单的链接。这是因为点击从IDP到SP的链接是SAML的步骤。第一步，实际上。它允许IDP定义您是谁，以及您尝试访问其资源的原因。因此，基本上您需要做的是向您所关注的链接发出请求以访问Web表单，并获取它将设置的cookie。您将看不到的是SAMLRequest字符串，编码到您将在链接后面找到的302重定向，发送到IDP进行连接。

我认为这就是为什么你不能机械化整个过程的原因。您只是连接到表单，没有完成身份识别！

B中。第二步：填写表格并提交

这个很容易。请小心！现在设置的Cookie与上述Cookie不同。您现在正在连接到一个完全不同的网站。这就是使用SAML的原因：不同的网站，相同的凭据。因此，您可能希望将成功登录提供的这些身份验证cookie存储到其他变量中。 IDP现在将发回一个响应（在SAMLRequest之后）：SAMLResponse。您必须检测到它获取登录结束的网页的源代码。实际上，这个页面是一个包含响应的大表单，在页面加载时，JS中的一些代码会自动对其进行子选。你必须得到页面的源代码，解析它摆脱所有HTML无用的东西，并获得SAMLResponse（加密）。

℃。第三步：将响应发送回SP

现在您已准备好结束该程序。您必须向SP发送（通过POST，因为您正在模拟表单）上一步中获得的SAMLResponse。通过这种方式，它将提供访问您想要访问的受保护资料所需的cookie。

Aaaaand，你已经完成了！

同样，我认为您需要做的最宝贵的事情是使用Opera并分析SAML所做的所有重定向。然后，在您的代码中复制它们。这并不困难，请记住，IDP与SP完全不同。

Answer 2

使用无头PhantomJS webkit的Selenium将是您登录Shibboleth的最佳选择，因为它可以为您处理cookie甚至是Javascript。

安装：

$ pip install selenium
$ brew install phantomjs

from selenium import webdriver
from selenium.webdriver.support.ui import Select # for <SELECT> HTML form

driver = webdriver.PhantomJS()
# On Windows, use: webdriver.PhantomJS('C:\phantomjs-1.9.7-windows\phantomjs.exe')

# Service selection
# Here I had to select my school among others 
driver.get("http://ent.unr-runn.fr/uPortal/")
select = Select(driver.find_element_by_name('user_idp'))
select.select_by_visible_text('ENSICAEN')
driver.find_element_by_id('IdPList').submit()

# Login page (https://cas.ensicaen.fr/cas/login?service=https%3A%2F%2Fshibboleth.ensicaen.fr%2Fidp%2FAuthn%2FRemoteUser)
# Fill the login form and submit it
driver.find_element_by_id('username').send_keys("myusername")
driver.find_element_by_id('password').send_keys("mypassword")
driver.find_element_by_id('fm1').submit()

# Now connected to the home page
# Click on 3 links in order to reach the page I want to scrape
driver.find_element_by_id('tabLink_u1240l1s214').click()
driver.find_element_by_id('formMenu:linknotes1').click()
driver.find_element_by_id('_id137Pluto_108_u1240l1n228_50520_:tabledip:0:_id158Pluto_108_u1240l1n228_50520_').click()

# Select and print an interesting element by its ID
page = driver.find_element_by_id('_id111Pluto_108_u1240l1n228_50520_:tableel:tbody_element')
print page.text

注意：

在开发过程中，使用Firefox预览您正在执行的操作driver = webdriver.Firefox()
此脚本按原样提供，并提供相应的链接，因此您可以将每行代码与页面的实际源代码进行比较（至少登录）。

Answer 3

从上面的StéphaneBruckert扩展答案，一旦您使用Selenium获取身份验证cookie，您仍然可以切换到请求：

import requests
cook = {i['name']: i['value'] for i in driver.get_cookies()}
driver.quit()
r = requests.get("https://protected.ac.uk", cookies=cook)

Answer 4

您可以找到here有关Shibboleth身份验证过程的更详细说明。

Answer 5

我编写了一个能够登录Shibbolized页面的简单Python脚本。

首先，我在Firefox中使用Live HTTP Headers来观看我所定位的特定Shibbolized页面的重定向。

然后我使用urllib.request编写了一个简单的脚本（在Python 3.4中，但Python 2.x中的urllib2似乎具有相同的功能）。我发现urllib.request之后的默认重定向适用于我的目的，但我发现继承urllib.request.HTTPRedirectHandler并在此子类（类ShibRedirectHandler）中为所有http_error_302添加处理程序很好事件。

在这个子类中，我只打印出参数的值（用于调试目的）;请注意，为了利用下面的默认重定向，您需要使用return HTTPRedirectHandler.http_error_302(self, args...)结束处理程序（即调用基类http_errror_302处理程序。）

使urllib与Shibbolized身份验证协同工作的最重要的组件是创建添加了Cookie处理的OpenerDirector。您使用以下内容构建OpenerDirector：

cookieprocessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(ShibRedirectHandler, cookieprocessor)
response = opener.open("https://shib.page.org")

这是一个可以启动的完整脚本（您需要更改我提供的一些模拟URL，并输入有效的用户名和密码）。这使用Python 3类;在Python2中使用urllib2替换urllib.request，用urlparse替换urlib.parse：

import urllib.request
import urllib.parse

#Subclass of HTTPRedirectHandler. Does not do much, but is very
#verbose. prints out all the redirects. Compaire with what you see
#from looking at your browsers redirects (using live HTTP Headers or similar)
class ShibRedirectHandler (urllib.request.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print (req)
        print (fp.geturl())
        print (code)
        print (msg)
        print (headers)
        #without this return (passing parameters onto baseclass) 
        #redirect following will not happen automatically for you.
        return urllib.request.HTTPRedirectHandler.http_error_302(self,
                                                          req,
                                                          fp,
                                                          code,
                                                          msg,
                                                          headers)

cookieprocessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(ShibRedirectHandler, cookieprocessor)

#Edit: should be the URL of the site/page you want to load that is protected with Shibboleth
(opener.open("https://shibbolized.site.example").read())

#Inspect the page source of the Shibboleth login form; find the input names for the username
#and password, and edit according to the dictionary keys here to match your input names
loginData = urllib.parse.urlencode({'username':'<your-username>', 'password':'<your-password>'})
bLoginData = loginData.encode('ascii')

#By looking at the source of your Shib login form, find the URL the form action posts back to
#hard code this URL in the mock URL presented below.
#Make sure you include the URL, port number and path
response = opener.open("https://test-idp.server.example", bLoginData)
#See what you got.
print (response.read())

Answer 6

Mechanize可以完成这项工作，除了它不处理Javascript。认证成功，但一旦在主页上，我无法加载这样的链接：

<a href="#" id="formMenu:linknotes1"
   onclick="return oamSubmitForm('formMenu','formMenu:linknotes1');">

如果您需要Javascript，请更好地使用Selenium with PhantomJS。否则，我希望你能从这个剧本中找到灵感：

#!/usr/bin/env python
#coding: utf8
import sys, logging
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text

br = mechanize.Browser() # Browser
cj = cookielib.LWPCookieJar() # Cookie Jar
br.set_cookiejar(cj) 

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# User-Agent
br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36')]

br.open('https://ent.unr-runn.fr/uPortal/')
br.select_form(nr=0)
br.submit()

br.select_form(nr=0)
br.form['username'] = 'myusername'
br.form['password'] = 'mypassword'
br.submit()

br.select_form(nr=0)
br.submit()

rs = br.open('https://ent.unr-runn.fr/uPortal/f/u1240l1s214/p/esup-mondossierweb.u1240l1n228/max/render.uP?pP_org.apache.myfaces.portlet.MyFacesGenericPortlet.VIEW_ID=%2Fstylesheets%2Fetu%2Fdetailnotes.xhtml')

# Eventually comparing the cookies with those on Live HTTP Header: 
print "Cookies:"
for cookie in cj:
    print cookie

# Displaying page information
print rs.read()
print rs.geturl()
print rs.info();

# And that last line didn't work
rs = br.follow_link(id="formMenu:linknotes1", nr=0)

Answer 7

我的大学页面SAML身份验证也遇到了类似的问题。

基本思想是使用requests.session对象自动处理大多数http重定向和cookie存储。但是，有很多重定向同时使用了javascript，这导致了使用简单请求解决方案的多个问题。

我最终使用fiddler来跟踪浏览器向大学服务器发出的每个请求，以填补我错过的重定向。它确实使这个过程更容易。

我的解决方案远非理想，但似乎有效。

Answer 8

尽管已经回答了，但希望对您有所帮助。我的任务是从SAML网站下载文件，并得到了StéphaneBruckert的回答的帮助。

如果使用无头，则需要在登录所需的重定向间隔内指定等待时间。浏览器登录后，我使用了其中的cookie，并将其与requests模块一起使用来下载文件-Got help from this。

这就是我的代码的样子-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options  #imports

things_to_download= [a,b,c,d,e,f]     #The values changing in the url
options = Options()
options.headless = False
driver = webdriver.Chrome('D:/chromedriver.exe', options=options)
driver.get('https://website.to.downloadfrom.com/')
driver.find_element_by_id('username').send_keys("Your_username") #the ID would be different for different website/forms
driver.find_element_by_id('password').send_keys("Your_password")
driver.find_element_by_id('logOnForm').submit()
session = requests.Session()
cookies = driver.get_cookies()
for things in things_to_download:    
    for cookie in cookies: 
        session.cookies.set(cookie['name'], cookie['value'])
    response = session.get('https://website.to.downloadfrom.com/bla/blabla/' + str(things_to_download))
    with open('Downloaded_stuff/'+str(things_to_download)+'.pdf', 'wb') as f:
        f.write(response.content)            # saving the file
driver.close()

Answer 9

如果所有其他方法均失败，我建议在“ headfull”模式下使用Selenium的网络驱动程序（即，将打开一个浏览器窗口，允许输入用户名，密码和任何其他必要的登录信息），以便于访问目标网站，即使您的表单比标准的“用户名”和“密码”二人组更为复杂，并且您不确定如何填写其他答案中提到的br.form部分。

from selenium import webdriver
import time

DRIVER_PATH = r'C:/INSERT_YOUR_PATH_HERE/chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get('https://moodle.tau.ac.il/login/index.php') # This is the login screen

一旦这样做，您就可以创建一个循环，以检查是否已到达目标URL-如果已到达，您就进入了！这段代码对我有用。我的目标是访问我大学的课程网站Moodle并自动下载所有PDF。

targetUrl = False
timeElapsed = 0

def downloadAllPDFs():         # Or any other function you'd like, the point is that 
    print("Access Granted!")   # you now have access to the HTML. 

while not targetUrl and timeElapsed < 60:
    time.sleep(1)
    timeElapsed += 1
    if driver.current_url == r"https://moodle.tau.ac.il/my/": # The site you're trying to login to.
        downloadAllPDFs()
        targetUrl = True

Answer 10

我按照接受的答案编写了此代码。这在两个独立的项目中为我工作

.apply()

使用python登录SAML / Shibboleth认证服务器

10 个答案:

一种。第一步：用户代理向SP

B中。第二步：填写表格并提交

℃。第三步：将响应发送回SP

安装：

注意：