Question

我尝试使用抓取数据的请求登录https://www.voxbeam.com/login。我是一个蟒蛇初学者，我已经完成了大部分教程，还有一些我自己使用BeautifulSoup进行网络抓取。

查看HTML：

<form id="loginForm" action="https://www.voxbeam.com//login" method="post" autocomplete="off">

<input name="userName" id="userName" class="text auto_focus" placeholder="Username" autocomplete="off" type="text">

<input name="password" id="password" class="password" placeholder="Password" autocomplete="off" type="password">

<input id="challenge" name="challenge" value="78ed64f09c5bcf53ead08d967482bfac" type="hidden">

<input id="hash" name="hash" type="hidden">

我了解我应该使用发布方法，并发送 userName 和密码

我正在尝试这个：

import requests
import webbrowser

url = "https://www.voxbeam.com/login"
login = {'userName': 'xxxxxxxxx',
         'password': 'yyyyyyyyy'}

print("Original URL:", url)

r = requests.post(url, data=login)

print("\nNew URL", r.url)
print("Status Code:", r.status_code)
print("History:", r.history)

print("\nRedirection:")
for i in r.history:
    print(i.status_code, i.url)

# Open r in the browser to check if I logged in
new = 2  # open in a new tab, if possible
webbrowser.open(r.url, new=new)

我希望，在成功登录后可以使用 r 指向信息中心的网址，这样我就可以开始抓取我需要的数据了。

当我使用身份验证信息代替xxxxxx和yyyyyy运行代码时，我得到以下输出：

Original URL: https://www.voxbeam.com/login

New URL https://www.voxbeam.com/login
Status Code: 200
History: []

Redirection:

Process finished with exit code 0

我通过www.voxbeam.com/login

在浏览器中输入了一个新标签页

代码中有什么问题吗？我错过了HTML中的内容吗？可以期望在r中获取仪表板URL，或者重定向并尝试在浏览器选项卡中打开URL以直观地检查响应，或者我应该以不同的方式做事情？

我在这里阅读了很多类似的问题已经有好几天了，但似乎每个网站的身份验证过程都有点不同，我检查了http://docs.python-requests.org/en/latest/user/authentication/，其中描述了其他方法，但我没有找到任何内容。 HTML建议我应该使用其中一个而不是发布

我也试过了

r = requests.get(url, auth=('xxxxxxxx', 'yyyyyyyy'))

但它似乎也没有用。

Answer 1

如上所述，您应该发送表单的所有字段的值。这些可以在浏览器的Web检查器中找到。此表单发送2个附加隐藏值：

url = "https://www.voxbeam.com//login"
data = {'userName':'xxxxxxxxx','password':'yyyyyyyyy','challenge':'zzzzzzzzz','hash':''}  
# note that in email have encoded '@' like uuuuuuu%40gmail.com      

session = requests.Session()
r = session.post(url, headers=headers, data=data)

此外，许多网站都有像隐藏表单字段，js，发送编码值等机器人的保护。作为变种，你可以：

1）使用手动登录的cookie：

url = "https://www.voxbeam.com"
headers = {'user-agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36"}
cookies = {'PHPSESSID':'zzzzzzzzzzzzzzz', 'loggedIn':'yes'}

s = requests.Session()
r = s.post(url, headers=headers, cookies=cookies)

2）使用模块Selenium：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = "https://www.voxbeam.com//login"
driver = webdriver.Firefox()
driver.get(url)

u = driver.find_element_by_name('userName')
u.send_keys('xxxxxxxxx')
p = driver.find_element_by_name('password')
p.send_keys('yyyyyyyyy')
p.send_keys(Keys.RETURN)

Answer 2

尝试更清楚地指定URL，如下所示：

  url=https://www.voxbeam.com//login?id=loginForm

这将在登录表单上设置setFocus，以便POST方法应用

Answer 3

这取决于网站如何处理登录过程，这非常棘手，但是我所做的是我使用了Charles这个代理应用程序，并在我手动登录时监听了我的浏览器发送到网站服务器的请求。之后，我将与Charles中显示的标题和cookie完全相同的标题和cookie复制到了自己的python代码中，并且可以正常工作！我假设Cookie和标头用于阻止机器人登录。

Answer 4

from webbot import Browser

web = Browser() # this will navigate python to browser

link = web.go_to('enter your login page url') 
#remember click the login button then place here

login = web.click('login') #if you have login button in your web , if you have signin button then replace login with signin, in my case it is login


id = web.type('enter your Id/Username/Emailid',into='Id/Username/Emilid',id='txtLoginId') #id='txtLoginId' this varies from web to web find this by inspecting the Id/Username/Emailid Button, in my case it is txtLoginId

next = web.click('NEXT', tag='span')

passw = web.type('Enter Your Password', into='Password', id='txtpasswrd')
#id='txtpasswrd' (this also varies from web to web similiarly inspect the Password Button)in my case it is txtpasswrd

home = web.click('NEXT', id="fa fa-home", tag='span') 
# id="fa fa-home" (Now inspect all necessary Buttons and move accordingly) in my case it is fa fa-home
next11 = web.click('NEXT', tag='span')

使用python请求登录网站

4 个答案: