Python Web抓取请求自动登录不起作用

时间:2017-11-04 06:52:45

标签: python python-3.x web-scraping beautifulsoup python-requests

我一直在尝试使用python requests模块抓取网站,并且需要登录该网站以检索我想要的数据。我到处都看了看,但却找不到它为什么不起作用。到目前为止,这是我的代码:

import requests
import bs4 as bs

login_url = "__withheld__"
target_url = "__withheld__"

login_data = { "username": "my_username", "password": "my_password"}

with requests.Session() as s:
    page = s.get(login_url)
    page_login = s.post(login_url, data = login_data)
    page = s.get(target_url)
    final_page = bs.BeautifulSoup(page.content, 'lxml')
    print(final_page.title)

这是密码框的html:

<input name="username" type="text" id="username" class="metro-input" placeholder="Username" value="">
<span id="username-error" class=""></span>
<label class="ie789Only"> Password</label>
<input name="password" type="password" id="password" class="metro-input" placeholder="Password">
<input type="submit" name="button1" value="Sign in" id="button1" class="metro-button">

我认为这可能与需要用户点击按钮的网站有关,但我找不到任何解决方案。当我自己登录时,我也尝试在开发者控制台中查找任何帖子表格,但没有找到确定密码/用户名的明确表格。任何帮助表示赞赏。

更新 以下是同一公司(隐私)运行的网站的链接,如果有任何帮助,则具有相同的安全功能:https://ashwood-vic.compass.education/login.aspx?sessionstate=disabled

1 个答案:

答案 0 :(得分:0)

您可以尝试以下代码

import requests
import bs4 as bs
username = 'username of the site'
password = 'password of the site'

req = requests.get(login_url, auth=(username, password))
final_page = bs.BeautifulSoup(req.content, 'lxml')
print(final_page.title)

- 请参考此http://docs.python-requests.org/en/master/user/authentication/#basic-authentication