我需要抓取数据的网页是在登录页面之后。我尝试了许多方法来完成此操作,但似乎没有一种有效。有人可以帮忙吗?我的代码在下面...
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36(KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}
login_data = {
'appname': 'unknown',
'appversion': 'unknown',
'ostype': 'mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36
(khtml, like gecko) chrome/70.0.3538.110 safari/537.36',
'type': 'null',
'ssobypass': 'true',
'dirlogin': 'true',
'inch': 'true',
'scrWidth': '1920',
'scrHeight': '1040',
'username': 'TA_KAITM_B_4a',
'userpassword': ''}
with requests.Session() as s:
url = "http://cmis.ittdublin.ie"
r = s.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
r = s.post(url, data=login_data, headers=headers)
print(r.content)
不允许我在此处添加登录屏幕的HTML ... 下面是代码,如果运行将返回登录页面的HTML ...
import requests
from lxml import html
session_requests = requests.session()
login_url = "http://cmis.ittdublin.ie/eportal/index.jsp"
result = session_requests.get(login_url)
payload = {
"username": "TA_KAITM_B_4a"
}
result = session_requests.post(
login_url,
data = payload,
headers = dict(referer=login_url)
)
print(result.text)
url = 'http://cmis.ittdublin.ie/eportal/index.jsp'
result = session_requests.get(
url,
headers = dict(referer = url)
)
答案 0 :(得分:0)
您需要发布的网址是
http://cmis.ittdublin.ie/eportal/PortalServ?reqtype=login
我对此很乐观。是否使您进入有用的地方取决于setAdminLoginLocation()的作用,但是除了管理员登录之外,它什么都不做。