我必须从需要登录的网站上抓取数据。
这是我正在使用的当前代码,但没有获取登录页面的HTML。
from requests import Session
from bs4 import BeautifulSoup as bs
with Session() as s:
site = s.get("https://www.valueresearchonline.com/membership/getin.asp?ref=%2Fport_v1%2Fdefault%2Easp%3Fselv%3D8%26poid%3D1443091")
bs_content = bs(site.content, "html.parser")
token = bs_content.find("input", {"name":"ref"})["value"]
login_data = {"username":"<username>","password":"<password>","ref":token}
p = s.post("https://www.valueresearchonline.com/membership/getin.asp?ref=%2Fport_v1%2Fdefault%2Easp%3Fselv%3D8%26poid%3D1443091",login_data)
print(p.text)
我得到的HTML与登录前的HTML相同。此外,我不确定该站点是否需要令牌部分,因此我尝试过一次使用它,一次不使用它,但我两种情况的结果都与我解释的相同。
答案 0 :(得分:0)
在
中再添加一个参数 p = s.post("https://www.valueresearchonline.com/membership/getin.asp?ref=%2Fport_v1%2Fdefault%2Easp%3Fselv%3D8%26poid%3D1443091",login_data)
为allow_redirects=True
,并将URL更改为https://www.valueresearchonline.com/registration/loginprocess.asp
:
p = s.post("p = s.post("https://www.valueresearchonline.com/registration/loginprocess.asp", data=login_data, allow_redirects=True)", data=login_data, allow_redirects=True)
检查是否适合您。
答案 1 :(得分:0)
将您的电子邮件和密码放在payload['username']
和payload['password']
的值之内,我想它将使您登录。</ p>
import requests
from bs4 import BeautifulSoup
url = "https://www.valueresearchonline.com/membership/getin.asp"
post_url = "https://www.valueresearchonline.com/registration/loginprocess.asp"
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0'
site = s.get(url)
soup = BeautifulSoup(site.text, "lxml")
payload = {item['name']:item.get('value','') for item in soup.select('input[name]')}
payload['username'] = 'your email'
payload['password'] = 'your password'
p = s.post(post_url,data=payload)
print(p.text)