Python:尝试使用请求进行loggin并执行HTTP请求

时间:2017-04-06 13:41:14

标签: python python-3.x http beautifulsoup python-requests

我正在尝试使用以下python代码登录到我的帐户,但没有成功。登录过程分为两页,分为两步。首先输入登录,第二次输入密码。我使用的是Python3:

from bs4 import BeautifulSoup
import requests, lxml.html

with requests.Session() as s:
  #First login page
    login = s.get('https://accounts.ft.com/login')
    login_html = lxml.html.fromstring(login.text)
#getting the form inputs
    hidden_inputs = login_html.xpath(r'//form//input')
    form = {x.name: x.value for x in hidden_inputs}
#filling inputs with email
    form['email'] = 'me@mail.com'
    response = s.post('https://accounts.ft.com/login', data=form)
# Receive reponse 200

#Second login page
    login_html = lxml.html.fromstring(response.text)
#getting inputs
    hidden_inputs = login_html.xpath(r'//form//input')
    form = {x.name: x.value for x in hidden_inputs}
#filling inputs with email and password
    form['email'] = 'me@mail.com'
    form['password'] = 'p****word'
    response = s.post('https://accounts.ft.com/login', data=form)
#Receive reponse 200

#Trying to read an article being loggedIn
    page = s.get('https://www.ft.com/content/173695cc-1a98-11e7-a266-12672483791a')
    soup = BeautifulSoup(page.content, 'html.parser')
    print(soup.prettify())
# data-next-is-logged-in="false" => Please Register to read this page...
  • 以下是表单的内容:

<div class="js-container" data-component="two-step-login-form" id="content">
  <div class="lgn-box">
    <form action="/login/submitEmail" class="js-email-lookup-form" data-test-id="enter-email-form" method="POST" name="enter-email-form" novalidate="">
      <input name="location" type="hidden" value="" />
      <input name="continueUrl" type="hidden" value="" />
      <input name="readerId" type="hidden" value="" />
      <input name="loginUrl" type="hidden" value="/login" />
      <div class="lgn-box__title">
        <h1 class="lgn-heading--alpha">
          Sign in
        </h1>
      </div>
      <div class="o-forms-group">
        <label class="o-forms-label" for="email">
       Email address
      </label>
        <input autocomplete="off" autofocus="" class="o-forms-text js-email" id="email" maxlength="64" name="email" required="" type="email">
        <input id="password" name="password" style="display:none" type="password">
        <label for="password">
        </label>
        </input>
        </input>
      </div>
      <div class="o-forms-group">
        <button class="o-buttons o-buttons--standout o-buttons--big" name="Next" type="submit">
       Next
      </button>
      </div>
    </form>
  </div>

  • 以下是我传递给POST的数据:

      
        

    形式     {'password':'p **** word','continueUrl':'','loginUrl':'/ login','email':'me@mail.com','readerId':'',' location':''}

      
  • POST请求返回第1和第2个登录页面的200响应。但似乎我还没有登录。

  • 我尝试使用http://accounts.ft.com/sso/redirects?email=me@mail.com作为POST请求的网址,返回405 Bad Request错误

  • 我不确定我是否真的没有登录,我不知道如何监控。

  • 如果不是在网络浏览器中,网站是否可能阻止我登录?

1 个答案:

答案 0 :(得分:1)

尝试使用selenium模拟网络浏览器,因为FT会阻止自动访问。

或者,您可以查看某个网站是否已使用archive.is之类的内容进行存档(这会将大多数网站拉入更加机器友好的设置中)。

最后,FT提供的数据挖掘API和标题API developer page