Python:如何从HTML页面获取隐藏的html内容

时间:2017-03-23 11:06:54

标签: javascript jquery html python-3.x robobrowser

我正在尝试制作一个程序,可以重新发行我使用robobrowser从图书馆中取出的书籍,因为我本来应该

1)登录我的身份证 2)勾选相应书籍的复选框 3)点击提交

在我登录并打印了页面的响应后,它确实打印了页面的源代码,除了表格的源代码,其中所有的勾选框和书籍名称都存在..

它应该也显示表格的源代码,即

     <tr>
        <td valign="top">   














    <form action="./selectedBooks" method="get" id="form1" target="nid">
    <table border="0" class="briefListTbl" cellspacing="0" cellpadding="0" 
    valign="top">
    <tbody><tr><td>
    <table valign="top" border="0" cellspacing="0" cellpadding="0" height="100%" 
    width="100%">
       <tbody class="briefListHead">
    <tr align="middle">


            <td width="10%">
                AccNo.
            </td>

            <td width="40%">
                Title
            </td>

            <td width="15%">
                Author
            </td>

            <td width="15%">
                Due date
            </td>

            <td width="10%">
                Reserved
            </td>

            <td>Renew</td> 

    </tr>
    </tbody></table>                                        
             </td></tr>
        <tr>
        <td class="briefListBody">

           <div class="scrollbarTbl" id="divTblScroll" style="overflow: auto; 
    height: 178px; width: 1176px;">
           <table class="briefListTbl" border="0" cellspacing="2" 
    cellpadding="0" width="100%" height="100%">
           <tbody valign="top">
            <tr class="briefListRow1" id="checkoutsRow1">

                                <td width="10%">50884</td>

                        <td width="40%" align="left">Programming with C (005.133 
    GOT;  42)</td>
                        <td width="15%" align="left">Gottfried, Byron</td>
                        <td width="15%" align="left">24/03/2017</td>
                        <td width="10%" align="left">-</td>
                                <td align="center"><input type="checkbox" 
    name="selectedforRenewal" value="0"></td>

                        </tr>

                        <tr class="briefListRow2" id="checkoutsRow2">

                                <td width="10%">51203</td>

                        <td width="40%" align="left">Engineering physics 
    (621PRA;  147)</td>
                    <td width="15%" align="left"> dr.psk formy</td>
                    <td width="15%" align="left">24/03/2017</td>
                 `   <td width="10%" align="left">-</td>
                                <td align="center"><input type="checkbox" 
    name="selectedforRenewal" value="1"></td>

                    </tr>

                    <tr class="briefListRow1" id="checkoutsRow3">

                            <td width="10%">20810</td>

                    <td width="40%" align="left">Objective Mathematics (511 SHA 17)</td>
                    <td width="15%" align="left">SHA(R.D)</td>
                    <td width="15%" align="left">30/03/2017</td>
                    <td width="10%" align="left">-</td>
                            <td align="center"><input type="checkbox" name="selectedforRenewal" value="2"></td>

                    </tr>

                    <tr class="briefListRow2" id="checkoutsRow4">

                            <td width="10%">22455</td>

                    <td width="40%" align="left">Elements of Mechanical Engineering (620 PRA 19)</td>
                    <td width="15%" align="left">holydon</td>
                    <td width="15%" align="left">03/04/2017</td>
                    <td width="10%" align="left">-</td>
                            <td align="center"><input type="checkbox" name="selectedforRenewal" value="3"></td>

                    </tr>

                    <tr height="100%"><td><input type="hidden" name="searchOrBrowse" value="checkouts"></td></tr>
           </tbody>
           </table>
    </div>


    </td>
</tr>
</tbody>
<tbody><tr>


                <td class="briefListFoot">

                    <table border="0" width="100%" class="briefListFoot" cellspacing="0" cellpadding="0">
                    <tbody><tr>
                        <td align="left" valign="top" width="0*" height="0*" nowrap="">

                        </td>
                        <td align="middle" valign="center" height="0*">

                                <a class="briefListHREFFoot" href="javascript:showpopupFrame('./renew');"><b>Renew</b></a>

                            </td>

                           <td align="right" valign="top" width="0*" height="0*" nowrap="">

                                <a class="briefListHREFFoot">Next<img border="0" src="../images/button/nill.GIF" alt=""></a>

                        </td>
                    </tr>
                </tbody></table></td>



</tr>


</tbody></table>



                </form></td></tr> 

但它只是打印

<tr><td valign="top">
</td></tr>
</table>
</td>
</tr>
</table>

和我的代码..

browser = RoboBrowser(parser='lxml', user_agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11', history=True, session=start)


browser.open('http://61.12.27.181:8080/opac/html/checkouts')
sign_in = browser.get_form(action='./memberlogin')
sign_in['txtmemberid'].value = 'MyId'
browser.submit_form(sign_in, submit='  Go  ')
box = browser.get_form(action='./selectedBooks')
print(browser.parsed)

缺少整个td标记,我不认为这是因为robobrowser,因为我也正在通过会话,那么为什么会这样呢? 有什么帮助吗?

1 个答案:

答案 0 :(得分:0)

这个问题可以通过检查元素从网络选项卡添加脚本URL来解决,即

1)转到检查元素工具中的网络选项卡 2)获取urljs的{​​{1}}个css个回复,并使用okbrowser.session.get(url_of_js_or_css)中的Robobrowser请求

祝你好运!