我尝试抓取网站“ https://laboral.pjud.cl/SITLAPORWEB/InicioAplicacionPortal.do”,但是每次出现相同页面时都会出错。我认为问题是我必须先在此网站上进行身份验证。
我试图创建会话对象并发送发布请求,但似乎没有任何变化。
import requests
from bs4 import BeautifulSoup
from requests.auth import HTTPBasicAuth
username = 'user'
password = 'pass'
scrape_url = 'https://laboral.pjud.cl/SITLAPORWEB/InicioAplicacionPortal.do'
login_url = 'https://laboral.pjud.cl/SITLAPORWEB/jsp/LoginPortal/LoginPortal.jsp'
r = requests.get(login_url, auth=HTTPBasicAuth(username, password))
print(r.text)
>>>
<form name="InicioAplicacionForm" method="POST"
action="/SITLAPORWEB/InicioAplicacionPortal.do"><INPUT
type="hidden" name="FLG_Autoconsulta" value="1"><input
type="hidden" name="D0E0F02E"
value="764C8AA111F42E621BC10BA16CD8D8B2">
</form><script>document.InicioAplicacionForm.submit();</script>
login_info = {'username': username,'password': password, "D0E0F02E":"764C8AA111F42E621BC10BA16CD8D8B2"}
session = requests.session()
session.post(url=login_url, data=login_info)
url = session.get(url=scrape_url)
soup = BeautifulSoup(url.content, 'html.parser')
print(soup)