我正在尝试使用requests
模块从网站获取信息。要获取信息,您必须先登录,然后才能访问该页面。我查看了输入标签,发现它们分别称为login_username
和login_password
,但是由于某些原因,post
没有通过。我还读过here,说他通过等待几秒钟才解决了这一问题,然后再浏览另一页,它也无济于事。
这是我的代码:
import requests
import time
#This URL will be the URL that your login form points to with the "action" tag.
loginurl = 'https://jadepanel.nephrite.ro/login'
#This URL is the page you actually want to pull down with requests.
requesturl = 'https://jadepanel.nephrite.ro/clan/view/123'
payload = {
'login_username': 'username',
'login_password': 'password'
}
with requests.Session() as session:
post = session.post(loginurl, data=payload)
time.sleep(3)
r = session.get(requesturl)
print(r.text)
答案 0 :(得分:2)
login_username
和login_password
并不是所有必需的参数。如果您在浏览器开发人员工具中查看/login/
POST请求,您会发现还有一个_token
正在发送。
这是您需要从登录HTML中解析的内容。因此流程如下:
https://jadepanel.nephrite.ro/login
页_token
值对于HTML解析,我们可以使用BeautifulSoup
(当然还有其他选择):
from bs4 import BeautifulSoup
login_html = session.get(loginurl).text
soup = BeautifulSoup(login_html, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
payload = {
'login_username': 'username',
'login_password': 'password',
'_token': token
}
完整代码:
import time
import requests
from bs4 import BeautifulSoup
# This URL will be the URL that your login form points to with the "action" tag.
loginurl = 'https://jadepanel.nephrite.ro/login'
# This URL is the page you actually want to pull down with requests.
requesturl = 'https://jadepanel.nephrite.ro/clan/view/123'
with requests.Session() as session:
login_html = session.get(loginurl).text
soup = BeautifulSoup(login_html, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
payload = {
'login_username': 'username',
'login_password': 'password',
'_token': token
}
post = session.post(loginurl, data=payload)
time.sleep(3)
r = session.get(requesturl)
print(r.text)