我正在尝试制作一个程序,可以从我的大学网站上获取有关我出席的信息。为了做到这一点,我写了一个脚本来登录网站,这导致我到我的仪表板,然后转到Attendence
标签,获取href
并将其附加到url
大学网站,
attendence
类中的标记看起来像这样
<a href="../Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8=" id="aAttandance">Attendance</a>
当我点击出勤链接时,网页上的地址栏上显示url
http://erp.college_name.edu/Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8=
。
所以,我应该将href
附加到
'http://erp.college_name.edu'
。好的,我做了
L = 'http://erp.college_name.edu' + str(I.findAll('li')[4].a.get('href').replace('.', ''))
但问题是,当我获取href
时,它不是标签中的内容,它会不断变化,当我得到链接时,我打印L
我得到了这个..我假设得到..
http://erp.college_name.edu/Student/StudentAttendanceViewaspx?SID=aDmK9cEFWwDqvsWw5ZzEOw==|oTeYVRfW1u8=
但问题是我进入的href
与真正的url
有所不同,而且当我第二次重新启动计划时,我会继续改变,这是我第二次
http://erp.college_name.edu/Student/StudentAttendanceViewaspx?SID=WM/lbVRchyyBiLsDvkORJw==|MaP8NtvvrHE=
,为什么我得到这个,而且当我点击我的仪表板页面上的其他链接并再次点击考勤选项卡时,href
中的url
值再次在地址栏中更改? ..
所以,在那之后我做了,
opens = requests.get(L)
soup_2 = BeautifulSoup(opens.text, 'lxml')
print(L)
我得到了这个......
C:\Users\HUNTER\AppData\Local\Programs\Python\Python35-32\python.exe
C:/Users/HUNTER/PycharmProjects/dictionary/erp_1.py
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html><head><title>The page cannot be found</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<style type="text/css">
BODY { font: 8pt/12pt verdana }
H1 { font: 13pt/15pt verdana }
H2 { font: 8pt/12pt verdana }
A:link { color: red }
A:visited { color: maroon }
</style>
</head><body><table border="0" cellspacing="10" width="500"><tr><td>
<h1>The page cannot be found</h1>
The page you are looking for might have been removed, had its name
changed, or is temporarily unavailable.
<hr/>
<p>Please try the following:</p>
<ul>
<li>Make sure that the Web site address displayed in the address bar of
your browser is spelled and formatted correctly.</li>
<li>If you reached this page by clicking a link, contact
the Web site administrator to alert them that the link is incorrectly
formatted.
</li>
<li>Click the <a href="javascript:history.back(1)">Back</a> button to
try
another link.</li>
</ul>
<h2>HTTP Error 404 - File or directory not found.<br/>Internet
Information
Services (IIS)</h2>
<hr/>
<p>Technical Information (for support personnel)</p>
<ul>
<li>Go to <a href="http://go.microsoft.com/fwlink/?
linkid=8180">Microsoft
Product Support Services</a> and perform a title search for the words
<b>HTTP</b> and <b>404</b>.</li>
<li>Open <b>IIS Help</b>, which is accessible in IIS Manager (inetmgr),
and search for topics titled <b>Web Site Setup</b>, <b>Common
Administrative
Tasks</b>, and <b>About Custom Error Messages</b>.</li>
</ul>
</td></tr></table></body></html>
Process finished with exit code 0
更新
我将.replace('.', '')
方法替换为[2:]
,因为替换函数还从.
中的.aspx
删除了href
,现在问题已更改为{ {3}}
但是,href
的价值如何不断改变我如何获取该页面..
任何帮助?