我正在尝试制作一个脚本,该脚本将登录到网站,然后从该网站上只能在登录后访问的特定页面上抓取数据。无论如何,该数据都不会通过IDLE shell传递如果我已经登录或未登录,那么这告诉我该网站必须具有在登录网站代码中看不到的某种验证码或ID。我已经多次检查了登录站点的代码,但是找不到其他可能丢失的代码。我不确定是否可以在此处发布其他网站的HTML数据,但这是我正在编写的脚本。
请原谅带有excel脚本的注释部分:
我已经尝试过使用lxml和beautifulsoup进行导航,但是似乎没有任何效果。我已经在其他更简单的网站上尝试了类似的脚本,并且似乎在大多数情况下都可以正常工作。
import requests
from lxml import html
USERNAME = <username>
PASSWORD = <password>
LOGIN_URL = "https://www.tm3.com/homepage/login.jsf"
URL = "https://www.tm3.com/mmdrewrite/mmd/14902.faces"
def main():
session_requests = requests.session()
# Get login csrf token
result = session_requests.get(LOGIN_URL)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath('//input[@name=
"javax.faces.ViewState"]/@value')))[0]
# Create payload
payload = {
"username": USERNAME,
"password": PASSWORD,
"javax.faces.ViewState": authenticity_token
}
# Perform login
result = session_requests.post(LOGIN_URL, data = payload, headers =
dict(referer = LOGIN_URL))
# Scrape url
result = session_requests.get(URL, headers = dict(referer = URL))
tree = html.fromstring(result.content)
print('',result.content)
"""
#excel scripts
def excel():
import xlwt
book = xlwt.Workbook(encoding= "utf-8")
sheet1= book.add_sheet("Sheet1")
#for loop for putting data into different cells
num=0
row = sheet1.row(num)
row.write(num,test)
print("EVEN:" , test)
print("ODD:" , ODD)
book.save("Testing.xls")
"""
if __name__ == '__main__':
main()
我希望完整地打印网页,但是脚本只会打印出登录网页