Python请求登录后失败,页面已过期

时间:2019-08-29 09:59:08

标签: python web-scraping python-requests

我正在尝试使用 python请求Post 在登录页面后面检索一些html文本。但是我的代码无法通过包含.... The page has expired due to inactivity.的返回html来实现。

下面是我的代码。

import requests 

url_login = u"https://savethewater-game.com/login" 

headers = {
  'referer': 'https://savethewater-game.com/login',
  'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}

payload = {
    '_token': u'mUaaXNup3vCEtiln5QeQNJNwoO8LrCH9opoVE4GH',
    'email': u'someone@gmail.com', # fake email
    'password': u'12345678' # fake pass
}

with requests.Session() as session:
    p = session.post(url_login, headers=headers, data=payload)
    print(p.text)

Chrome开发者工具中截获的登录值如下所示:

:authority: savethewater-game.com
:method: POST
:path: /login
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
accept-encoding: gzip, deflate, br
accept-language: en,it;q=0.9,zh-CN;q=0.8,zh;q=0.7,zh-TW;q=0.6,en-US;q=0.5
cache-control: max-age=0
content-length: 93
content-type: application/x-www-form-urlencoded
cookie: XSRF-TOKEN=eyJpdiI6ImtvRW5UT1BMNjkxNVFBc1d2OVJKZ3c9PSIsInZhbHVlIjoicVVCYlhFRG50QmVKd3V1Yzh4NnNldUhvRXpZOWVSRDFiUGNsT1E4aG9oOUFpYlZ0M1BaRFwvR3VkK1Q4MkhLOFlBZDlxUWp4R0s4YjU4aTZGc0I0RVZ3PT0iLCJtYWMiOiIzYzMxMmI0ZjlhOTM0YzVjZjA5NDk2MDkxMDJlY2VlMjVmNjhiYTJiM2E2OTlkYmYzOTIyYzJiYTM0NTJhMWMyIn0%3D; savethewater_session=eyJpdiI6IjltY2M3alp2endPdWY4VmVpNGhKMXc9PSIsInZhbHVlIjoiVjR2T2lHempPVGM1YW04YldtbGkxcWU3TlwvU1N2RTRcL0VoMzFPY2RLb245bXo0bVJreDl0UnBMYlFjaDNOZlZlMEQ2YVpKVXU3QVYxWWRGNW13bE9wdz09IiwibWFjIjoiNjk0YTdmNTFmYzJiMzg2MDA3NmRiOGU5OTUwMWVkMDE3ZmRkZDY1NzUzMjVjMTYxNzljNjNlZTc4NzE5ODYyNiJ9
origin: https://savethewater-game.com
referer: https://savethewater-game.com/login
sec-fetch-mode: navigate
sec-fetch-site: same-origin
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36

一些帖子提到原因可能是网站被自动剪贴阻止了。我想知道地雷代码是否出错或其他问题。十分感谢!

1 个答案:

答案 0 :(得分:0)

使用该凭据在没有反复试验的情况下很难给出具体的解决方案。但是,请尝试以下方法。应该可以。

import requests 
from bs4 import BeautifulSoup

url_login = "https://savethewater-game.com/login" 

headers = {
  'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}

payload = {
    '_token': '',
    'email': 'someone@gmail.com',
    'password': '12345678'
}

with requests.Session() as session:
    res = session.get(url_login)

    cookie_val = res.headers['Set-Cookie'].split(";")[0]
    headers['cookie'] = cookie_val

    soup = BeautifulSoup(res.text,"lxml")
    token = soup.select_one('input[name="_token"]')['value']
    payload['_token'] = token

    p = session.post(url_login,data=payload,headers=headers)
    print(p.content)