使用python请求 - 多页登录

时间:2017-10-26 20:57:15

标签: python python-requests

我试图从SiteImprove中抓取数据,我需要先登录。我尝试使用请求尝试这样做,而我遇到的问题是two-page login form;您输入电子邮件的第一页然后点击提交,将您带到第二页输入您的密码。

这是我的代码:

import bs4
import requests

# SiteImprove login info
email_url = 'https://id.siteimprove.com/account/login?returnUrl=%2Fconnect%2Fauthorize%2Flogin%3Fclient_id%3Dmy2%26redirect_uri%3Dhttps%253A%252F%252Fmy2.siteimprove.com%252FAuth%252FAuthCallback%26response_mode%3Dform_post%26response_type%3Dcode%2520id_token%26scope%3Dopenid%2520profile%2520si.profile%26state%3DOpenIdConnect.AuthenticationProperties%253D_yfrURqUrguweaEJxfHZDPeOyW6Ds2DipoV5lLY0HKQ3AY_ziCYNQ6aNHj4TTJRQg-_BWvnKGixEMNgRXZi6yuNXh7l1XxuHP0wPc0Cj0B32XGlbnlfa0JYZ4hL9jJ7zrSLVBpK1SUPCEd5os5PmoyAv_ahAEesMQ5COXG8lY71klcoXz_vQRnXF9CzFR_AWMJnabKhychE0TMUS8li8Ao2O3YHPeGbNQls91wcWQso%26nonce%3D636446439257723222.ZmNmODMzMzUtNDlkYS00ZWM1LTk4NDYtNDMyYWQ5ZmI4MGE4M2QzMjc2M2QtYmJiZC00OTEyLThlNWItODEwOWVlNGFmNmNi'
password_url = 'https://id.siteimprove.com/account/LoginPassword'

email_data = {
    'Email': 'EMAIL',
    '__RequestVerificationToken': 'PULLED FROM CHROME DEV TOOLS ON LOGIN PAGE',
    'ReturnUrl': '/connect/authorize/login?client_id=my2&redirect_uri=https%3A%2F%2Fmy2.siteimprove.com%2FAuth%2FAuthCallback&response_mode=form_post&response_type=code%20id_token&scope=openid%20profile%20si.profile&state=OpenIdConnect.AuthenticationProperties%3D_yfrURqUrguweaEJxfHZDPeOyW6Ds2DipoV5lLY0HKQ3AY_ziCYNQ6aNHj4TTJRQg-_BWvnKGixEMNgRXZi6yuNXh7l1XxuHP0wPc0Cj0B32XGlbnlfa0JYZ4hL9jJ7zrSLVBpK1SUPCEd5os5PmoyAv_ahAEesMQ5COXG8lY71klcoXz_vQRnXF9CzFR_AWMJnabKhychE0TMUS8li8Ao2O3YHPeGbNQls91wcWQso&nonce=636446439257723222.ZmNmODMzMzUtNDlkYS00ZWM1LTk4NDYtNDMyYWQ5ZmI4MGE4M2QzMjc2M2QtYmJiZC00OTEyLThlNWItODEwOWVlNGFmNmNi'
}
password_data = {
    'Email': 'EMAIL',
    'Password': 'PASSWORD',
    'ReturnUrl': '/connect/authorize/login?client_id=my2&redirect_uri=https%3A%2F%2Fmy2.siteimprove.com%2FAuth%2FAuthCallback&response_mode=form_post&response_type=code%20id_token&scope=openid%20profile%20si.profile&state=OpenIdConnect.AuthenticationProperties%3D_yfrURqUrguweaEJxfHZDPeOyW6Ds2DipoV5lLY0HKQ3AY_ziCYNQ6aNHj4TTJRQg-_BWvnKGixEMNgRXZi6yuNXh7l1XxuHP0wPc0Cj0B32XGlbnlfa0JYZ4hL9jJ7zrSLVBpK1SUPCEd5os5PmoyAv_ahAEesMQ5COXG8lY71klcoXz_vQRnXF9CzFR_AWMJnabKhychE0TMUS8li8Ao2O3YHPeGbNQls91wcWQso&nonce=636446439257723222.ZmNmODMzMzUtNDlkYS00ZWM1LTk4NDYtNDMyYWQ5ZmI4MGE4M2QzMjc2M2QtYmJiZC00OTEyLThlNWItODEwOWVlNGFmNmNi',
    'RememberLogin': 'false',
    '__RequestVerificationToken': 'PULLED FROM CHROME DEV TOOLS ON LOGIN PAGE'
}

# Start a session so we can have persistant cookies
with requests.Session() as s:
    enter_email_page = s.post(email_url, data=email_data)
    enter_password_page = s.post(password_url, data=password_data)
    # print the html returned
    r = s.get('https://my2.siteimprove.com/Dashboard/73990/Dashboard2/Index')
    print(r.text)

当我最后打印时,我获取原始登录页面的HTML,而不是我尝试访问的受限页面。

感谢您的建议!

0 个答案:

没有答案