使用python请求登录网站:400 - 错误请求

时间:2017-04-05 12:26:37

标签: python login web-scraping python-requests lxml

我正在尝试使用请求登录网站。

登录过程分为两个阶段:

  • 第一步:在第一页上输入电子邮件。第一页源代码如下:

     <div id="content" class="js-container" data-component="two-step-login-form"> 
      <div class="lgn-box">
       <form name="enter-email-form" action="/login/submitEmail" class="js-email-lookup-form" method="POST" data-test-id="enter-email-form" novalidate="true">
        <input name="location" value="https://www.mywebsite.com/" type="hidden">
        <input name="continueUrl" value="" type="hidden">
        <input name="readerId" value="" type="hidden">
        <input name="loginUrl" value="/login?location=https%3A%2F%2Fwww.mywebsite.com%2F" type="hidden">
        <div class="lgn-box__title">
            <h1 class="lgn-heading--alpha">Sign in</h1>
        </div>
        <div class="o-forms-group">
            <label for="email" class="o-forms-label">Email address</label>
            <input id="email" class="o-forms-text js-email" name="email" maxlength="64" autocomplete="off" autofocus="" required="" type="email">
            <input id="password" name="password" style="display:none" type="password">
            <label for="password">
        </label></div>
        <div class="o-forms-group">
            <button class="o-buttons o-buttons--standout o-buttons--big" type="submit" name="Next">Next</button>
        </div>
    </form>
    

  • 第二步:在返回的第二页中输入密码。第二页源代码如下:

     <div id="content" class="js-container" data-component="two-step-login-form"> 
      <div class="lgn-box">
       <form name="enter-password-form" action="/login?location=https%3A%2F%2Fwww.mywebsite.com%2F" method="POST" data-test-id="enter-password-form" novalidate="">
        <input name="location" value="https://www.mywebsite.com/" type="hidden">
        <input name="continueUrl" value="" type="hidden">
        <input name="readerId" value="" type="hidden">
        <div class="lgn-box__title">
            <h1 class="lgn-heading--alpha">Sign in</h1>
        </div>
        <div class="o-forms-group">
            <a href="/login?location=https%3A%2F%2Fwww.mywebsite.com%2F" class="js-change-email lgn-typography-bold lgn-typography-big lgn-typography-link--back">Back</a>
        </div>
        <div class="o-forms-group">
            <label for="email" class="o-forms-label">Email</label>
            <input readonly="" class="js-email o-forms-text o-forms-unskin lgn-typography-bold lgn-typography-big lgn-typography-truncate" id="email" value="" name="email" type="text">
        </div>
        <div class="o-forms-group">
            <label for="password" class="o-forms-label">Password</label>
            <input id="password" name="password" class="o-forms-text" maxlength="50" autofocus="" required="" type="password">
            <small class="o-forms-additional-info"><a href="/reset-password">Forgot your password?</a></small>
        </div>
        <div class="o-forms-group lgn-utils-pack">
            <button class="o-buttons o-buttons--standout o-buttons--big" type="submit" name="Sign in">Sign in</button>
            <div class="lgn-typography-align-right">
                <input class="o-forms-checkbox" name="rememberMe" id="rememberMe" value="true" checked="" type="checkbox">
                <label for="rememberMe" class="o-forms-label lgn-utils-remove-margin">Remain signed in</label>
            </div>
        </div>
    </form>
    

为此,我使用以下代码:

import requests, lxml.html

with requests.Session() as s:
    login = s.get('https://mywebsite/login')
    login_html = lxml.html.fromstring(login.text)
    hidden_inputs = login_html.xpath(r'//form//input[@type="email"]')
    form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
    form['email'] = 'myemail@a.com'
    response = s.post('https://mywebsite/login', data=form)
     >> Bad Request 400 ERROR
    login_html = lxml.html.fromstring(response.text)
    hidden_inputs = login_html.xpath(r'//form//input[@type="password"]')
    form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
    form['password'] = '*****'
    response = s.post('https://website/login', data=form)
     >> Bad Request 400 ERROR
    print(form)

知道如何处理吗?

由于

(Python 3.5)

1 个答案:

答案 0 :(得分:1)

这不是一个非常正确的方法。您要做的是在您打开DevTools或FireBug时登录并查看标题以查看POST凭据所需的内容,然后将这些内容放入字典并提交请求。例如,我现在就是这个页面的DevTools输出:

General
Request URL:http://stackoverflow.com/posts/validate-body
Request Method:POST
Status Code:200 OK
Remote Address:151.101.129.69:80

Response Headers
view source
Accept-Ranges:bytes
Cache-Control:private
Connection:keep-alive
Content-Length:54
Content-Type:application/json
Date:Wed, 05 Apr 2017 17:53:29 GMT
Pragma:no-cache
Vary:Fastly-SSL
Via:1.1 varnish
X-Cache:MISS
X-Cache-Hits:0
X-DNS-Prefetch-Control:off
X-Frame-Options:SAMEORIGIN
X-Request-Guid:598a7e9d-7775-4ab9-9a8d-3c25d6e1984e
X-Served-By:cache-sjc3629-SJC
X-Timer:S1491414810.679660,VS0,VE141

Request Headers
view source
Accept:*/*
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Content-Length:100
Content-Type:application/x-www-form-urlencoded; charset=UTF-8
Cookie:prov=40a191ef-3cbb-7b00-af35-93d7cc8df595; __qca=P0-1351948118-1491163901240; _ga=GA1.2.2042334979.1491163901; acct=t=5U3Vk9gxgP4JVyr3WBuiQjdq6athwXsO&s=dQh5bzT%2foR1RhIOInkGQZDxQ9XdG7gUv
DNT:1
Host:stackoverflow.com
Origin:http://stackoverflow.com
Referer:http://stackoverflow.com/questions/43231181/login-in-to-website-using-python-requests-400-bad-request
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
X-Requested-With:XMLHttpRequest

Form Data
view source
view URL encoded
body:This isn't really the right approach. What you want to do is inspect 
oldBody:
isQuestion:false

您主要是在Request URLForm Data部分之后。 因此,在这种情况下,您需要执行以下操作:

s = requests.Session()
payload = {'body': 'The answer I\'m posting to your question', 'oldBody': '', 'isQuestion': 'false'}
response = s.post('http://stackoverflow.com/posts/validate-body', data=payload)
print(response.content)