以下是我一直试图用来登录厨师插图网站(https://www.cooksillustrated.com/sign_in)的一些代码。
我启动会话,获取身份验证令牌和隐藏的编码字段,然后传递电子邮件和密码字段的“名称”和“值”(通过检查chrome中的元素找到)。表格似乎不包含任何其他元素;但是,post方法不会让我登录。
我注意到所有CSRF令牌都以“==”结尾,所以我尝试删除它们。但它没有用。
我也尝试修改帖子以使用表单输入的“id”字段而不是“name”(只是在黑暗中拍摄,真的......名字看起来应该像我看到的那样起作用在其他例子中)。
任何想法都会非常感激。
import requests, lxml.html
s = requests.session()
# go to the login page and get its text
login = s.get('https://www.cooksillustrated.com/sign_in')
login_html = lxml.html.fromstring(login.text)
# find the hidden fields names and values; store in a dictionary
hidden_inputs = login_html.xpath(r'//form//input[@type="hidden"]')
form = {x.attrib['name']: x.attrib['value'] for x in hidden_inputs}
print(form)
# I noticed that they all ended in two = signs, so I tried taking that off
# form['authenticity_token'] = form['authenticity_token'][:-2]
# this adds to the form payload the two named fields for user name and password
# found using the "inspect elements" on the login screen
form['user[email]'] = 'my_email'
form['user[password]'] = 'my_pw'
# this uses "id" instead of "name" from the input fields
#form['user_email'] = 'my_email'
#form['user_password'] = 'my_pw'
response = s.post('https://www.cooksillustrated.com/sign_in', data=form)
print(form)
# trying to see if it worked - but the response URL is login again instead of main page
# and it can't find my name
# responses are okay, but I think that just means it posted the form
print(response.url)
print('Christopher' in response.text)
print(response.status_code)
print(response.ok)
答案 0 :(得分:0)
好吧,POST请求网址应为https://www.cooksillustrated.com/sessions
,如果您在登录时捕获所有流量,您将找到对服务器发出的实际POST请求:
POST /sessions HTTP/1.1
Host: www.cooksillustrated.com
Connection: keep-alive
Content-Length: 179
Cache-Control: max-age=0
Origin: https://www.cooksillustrated.com
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: https://www.cooksillustrated.com/sign_in
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
utf8=%E2%9C%93&authenticity_token=Uvku64N8V2dq8z%2BGerrqWNobn03Ydjvz8xqgOAvfBmvDM%2B71xJWl2DmRU4zbBE15gGVESmDKP2E16KIqBeAJ0g%3D%3D&user%5Bemail%5D=demo&user%5Bpassword%5D=demodemo
请注意,最后一行是此请求的编码数据,其中包含utf
,authenticity_token
,user[email]
和user[password]
的4个参数。
所以在你的情况下,form
应该包括所有这些:
form = {'user[email]': 'my_email',
'user[password]': 'my_pw',
'utf': '✓',
'authenticity_token': 'xxxxxx' # make sure you don't ignore '=='
}
此外,您可能希望添加一些标题,以显示来自Chrome(或您喜欢的任何浏览器),因为request
的默认标题为python-requests/2.13.0
,而某些网站则不会比如来自"机器人的流量":
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
... # more
}
现在我们已准备好发出POST请求:
response = s.post('https://www.cooksillustrated.com/sessions', data=form, headers=headers)