用于登录网站的Python脚本无法正常工作

时间:2016-10-23 06:02:45

标签: python python-2.7 web-scraping

这是我的代码:

from requests import session

payload = {
    'action': 'login',
    'TB1$TabPanel1$txt_user_name': '2016XXXXXXXXX',
    'TB1$TabPanel1$txt_password': '6XXXXXXXX'
}

with session() as c:
    c.post('http://erp.sangamuniversity.ac.in/Default.aspx', data=payload)
    response = c.get('http://erp.sangamuniversity.ac.in/Student/Student_Home.aspx')
    print response.text

print语句在登录后(http://erp.sangamuniversity.ac.in/Default.aspx')打印登录页面的html源代码(http://erp.sangamuniversity.ac.in/Student/Student_Home.aspx)而不是主页。为什么? (提供实际的用户名和密码用于测试脚本)。

2 个答案:

答案 0 :(得分:1)

看起来Default.aspx页面实际上是发布到default1.aspx

<form id="Form1" method="post" action="default1.aspx" onsubmit="javascript:return WebForm_OnSubmit();">

您应该更新脚本以发布到default1.aspx而不是Default.aspx

答案 1 :(得分:1)

它发送给default1.aspx的数据似乎并不那么简单。以下是我尝试登录网站时发现的Form Data(当然使用假用户名和密码)。

TB1_ClientState:{"ActiveTabIndex":0,"TabEnabledState":[true,true,true],"TabWasLoadedOnceState":[true,false,false]}
__EVENTTARGET:
__EVENTARGUMENT:
__VIEWSTATE:3H3N0D3O2bCVzC2589hkM0GNDQYgWbyM+6Fzg470Vn7uHJPXG42wBOjU+FnIquPNgTGURfUaXH7E8dehKn80fYveeDWBW9SPT3tWSZ4yZ7H/k0XE70YmktM2gmvlFlze7irk2q+hdoMxKVPTdBws80R/AvSAB2ldKUFAnMCskeFQnIPp+lHZ9v1cbCP7p1tLtlKANnRtw0Zq9M4Cudhw7Fn/IaMEjGCMfqfgSxLxKcscUzI0stDgdJBv5umfiZ93bXAJEFzjkVG0Rf1/TWLE27I5+xkHTJdBqSKnEzHZK/Y8XL+6rYiwJVc1eghUXAo4vaiaMI9zTWVTnHS4e93MMDVCepetHlY2H2xnNw0onAM7z7yAlNmcmER+aprvF/GLVL/cFDjX3+gruGh6LN0/9vdJpV0pLcHNxP4JfYE2sta9JeM1gdqM/7yN81/rx9JZGH7rgXUUlTbQV/+DnqYv5jO1RHTjnVe3AUet33gPyTGXdcjuBWVclZ7ZFs7GCa11tlQ6xGHvm8YdNvk1iUG8mKWIbmGTv4F5CrDGxA52jUjZgZdoujffY0zsjrTk1EVZNWuqq78wSOH+btJzskVjVkjWsMts7mne44FvVV8whEXV5imPM9ElZNG0m6BP1LkY7kTK+VQyYimOa/PvOIrx2rdodw+5T3C57R35XtAVHXr03wd2Wfqz1twu4zW34RDDFK6dBV+clvkHdXUphoQ8HvE+P9TKbsxJs9bxIKmjQCZ7THRWV1rU5RMRkvGgihESvnqqMUGCUDrY2aS8zmT6icvn+UkQEjgUVhCCmL0Z8bX/pr1D
__EVENTVALIDATION:ytnL3SjUuajpScr6kRi7DHy+a22yoaqVaYYAdgYolE2kdnwgXTllfDSQsE4Bmbb0yRN5myFapGWY6CboGRzfPgRjNbwoIHUVB3Nt9mebbCffoTliXlLHlHzIqwiNTPsIfHWxvqNuu/DX8LqI/XwDTPt4Bt57VpIbz37jsqB4A92qH5IMlylz81QRjmxfYGgjNBQkBbjPli7Pvw/gQ7v4SB4s+F+piQlD/71jwNay09tbXwZ2JovKm2igiyHpREbt/NIOpeiz0VOqhqPXdXo0vuAGFLp+Q7YltK5I5OZW7GoKArGv6Zpoqw85uctp0XthOIJVmKVfs/LZgps+nhKylyrJGTfLNO1Wl2rLZ+PiH36BBP8b9oihYjsn7Hvv0wvnfA/9VoOYx32ig6gyw1pmJ9R4nijwsuPAgOIRQQ8YmL9t48KUjTzEnwBXLyGYKPkolNo8ny7CAi2c/kQBYClAYxPPr5VBQmJUtBU/0Cui+tP+wXMa
TB1$TabPanel1$txt_user_name:222222
TB1$TabPanel1$txt_password:22222
TB1$TabPanel1$btn_log:Login
TB1$tbpanel2$F_drp_college:1
TB1$tbpanel2$tb_emp_id:
TB1$tbpanel2$tb_emp_password:
TB1$TabPanel2$txtusernameparent:
TB1$TabPanel2$txtPasswordParent:

此表单中的某些值可以在页面的html源代码中找到,例如

<div class="aspNetHidden">
    <input type="hidden" name="TB1_ClientState" id="TB1_ClientState" value="{&quot;ActiveTabIndex&quot;:0,&quot;TabEnabledState&quot;:[true,true,true],&quot;TabWasLoadedOnceState&quot;:[true,false,false]}">
    <input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="">
    <input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="">
    <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="3H3N0D3O2bCVzC2589hkM0GNDQYgWbyM+6Fzg470Vn7uHJPXG42wBOjU+FnIquPNgTGURfUaXH7E8dehKn80fYveeDWBW9SPT3tWSZ4yZ7H/k0XE70YmktM2gmvlFlze7irk2q+hdoMxKVPTdBws80R/AvSAB2ldKUFAnMCskeFQnIPp+lHZ9v1cbCP7p1tLtlKANnRtw0Zq9M4Cudhw7Fn/IaMEjGCMfqfgSxLxKcscUzI0stDgdJBv5umfiZ93bXAJEFzjkVG0Rf1/TWLE27I5+xkHTJdBqSKnEzHZK/Y8XL+6rYiwJVc1eghUXAo4vaiaMI9zTWVTnHS4e93MMDVCepetHlY2H2xnNw0onAM7z7yAlNmcmER+aprvF/GLVL/cFDjX3+gruGh6LN0/9vdJpV0pLcHNxP4JfYE2sta9JeM1gdqM/7yN81/rx9JZGH7rgXUUlTbQV/+DnqYv5jO1RHTjnVe3AUet33gPyTGXdcjuBWVclZ7ZFs7GCa11tlQ6xGHvm8YdNvk1iUG8mKWIbmGTv4F5CrDGxA52jUjZgZdoujffY0zsjrTk1EVZNWuqq78wSOH+btJzskVjVkjWsMts7mne44FvVV8whEXV5imPM9ElZNG0m6BP1LkY7kTK+VQyYimOa/PvOIrx2rdodw+5T3C57R35XtAVHXr03wd2Wfqz1twu4zW34RDDFK6dBV+clvkHdXUphoQ8HvE+P9TKbsxJs9bxIKmjQCZ7THRWV1rU5RMRkvGgihESvnqqMUGCUDrY2aS8zmT6icvn+UkQEjgUVhCCmL0Z8bX/pr1D">
</div>

<div class="aspNetHidden">
    <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="ytnL3SjUuajpScr6kRi7DHy+a22yoaqVaYYAdgYolE2kdnwgXTllfDSQsE4Bmbb0yRN5myFapGWY6CboGRzfPgRjNbwoIHUVB3Nt9mebbCffoTliXlLHlHzIqwiNTPsIfHWxvqNuu/DX8LqI/XwDTPt4Bt57VpIbz37jsqB4A92qH5IMlylz81QRjmxfYGgjNBQkBbjPli7Pvw/gQ7v4SB4s+F+piQlD/71jwNay09tbXwZ2JovKm2igiyHpREbt/NIOpeiz0VOqhqPXdXo0vuAGFLp+Q7YltK5I5OZW7GoKArGv6Zpoqw85uctp0XthOIJVmKVfs/LZgps+nhKylyrJGTfLNO1Wl2rLZ+PiH36BBP8b9oihYjsn7Hvv0wvnfA/9VoOYx32ig6gyw1pmJ9R4nijwsuPAgOIRQQ8YmL9t48KUjTzEnwBXLyGYKPkolNo8ny7CAi2c/kQBYClAYxPPr5VBQmJUtBU/0Cui+tP+wXMa">
</div>

以下代码已经过测试。

from requests import session

payload = {
    'TB1_ClientState':'{"ActiveTabIndex":0,"TabEnabledState":[true,true,true],"TabWasLoadedOnceState":[true,false,false]}',
    '__EVENTTARGET':'',
    '__EVENTARGUMENT':'',
    '__VIEWSTATE':[enter your own viewstate on the page],
    '__EVENTVALIDATION':[enter your own eventvalidation on the page],
    'TB1$TabPanel1$txt_user_name': [enter your own username],
    'TB1$TabPanel1$txt_password': [enter your own password],
    'TB1$TabPanel1$btn_log':'Login',
    'TB1$tbpanel2$F_drp_college':1,
    'TB1$tbpanel2$tb_emp_id':'',
    'TB1$tbpanel2$tb_emp_password':'',
    'TB1$TabPanel2$txtusernameparent':'',
    'TB1$TabPanel2$txtPasswordParent':''
}

with session() as c:
    c.post('http://erp.sangamuniversity.ac.in/Default1.aspx', data=payload)
    response = c.get('http://erp.sangamuniversity.ac.in/Student/Student_Home.aspx')
    print response.text

您可以查找aspNetHidden课程以获取viewstateeventvalidation。希望这将有助于您的情况:)