在使用带iframe的bs4从HTML中提取数据时需要帮助

时间:2019-06-14 04:26:53

标签: python selenium beautifulsoup

我正在尝试从网站中的表中提取数据,已经在线检查了一些解决方案,但是我仍然无法获取任何数据。 之后,我想将整个表格提取到CSV文件中。

使用request.get获取html和bs4。

这是代码:

page = requests.get(driver.current_url)
soup = BeautifulSoup(page,'html.parser')
frames = soup.findAll('frame',{'name':'main'})
links = my_table.findAll('tr',{'class'})

已发布以下HTML,因为该网站需要登录。

<frameset rows="40,*,20" frameborder="0" border="0">
        <frame name="nav" src="header.do" scrolling="no" noresize="">
        <frame name="Main" src="menu.do">
        <frame name="cr" src="cr.jsp" scrolling="no" noresize="">
        <noframes><body></body></noframes>
    </frameset>
    <frame name="Main" src="menu.do">
    <html><head>
    <base href="https://www.portnet.com/BASWeb/com/pn2/bas/web/BerthingEnquiry/BerthingEnquiryPage.jsp">
    <title>Berthing Enquiry</title>
    <link href="/BASWeb/resources/css/styles.css" type="text/css" rel="stylesheet">
    <body bgcolor="#fafafa" onload="postRender(this);" style="margin:0px">

  <iframe id="rpc" name="rpc" style="width:0px; height:0px; border: 0px" src=""></iframe>

    <table cellspacing="0" cellpadding="5" border="0" width="100%" style="border-bottom:1px solid #cccccc"><tbody><tr>
    <td align="left" style="border:0px" valign="bottom"><div class="pagehead">Berthing Enquiry</div></td>
    <td align="right" style="border:0px" valign="bottom" nowrap=""><small>Zoom:<input type="text" name="pagezoom" value="100" maxlength="3" size="1" onchange="changeDocumentZoom(this.value)">%&nbsp;
    <a href="" onclick="window.focus(); window.print(); return false" target="rpc"><img src="/BASWeb/resources/images/print.gif" height="16" width="16" border="0">&nbsp;Print</a>
    &nbsp;&nbsp;&nbsp;14-06-2019 12:06:45 SGT
    </small></td></tr></tbody></table><br>

    <form name="enquireBerthingVesselForm" action="/BASWeb/com/pn2/bas/web/BerthingEnquiry/enquireBerthingVessel.do" method="post">    
    <table border="0" width="700" align="center">
<tbody><tr valign="top">
<td style="border:0px">     
        <table align="center" width="100%" cellpadding="3" cellspacing="0" border="0" bordercolor="#cccccc">       
            <tbody><tr>
                </tr>
        </tbody></table>
        <table align="center" width="100%" cellpadding="3" cellspacing="0" border="1" bordercolor="#cccccc" class="table3d">
            <tbody><tr>
                <td class="rowhead" width="20%" align="center">*Vessel</td>

                <td align="right" width="15%" class="tablebody"> <input type="hidden" name="wlw-select_key:{actionForm.fullAbbr}OldValue" value="true"><select name="wlw-select_key:{actionForm.fullAbbr}"><option value="Abbreviated" selected="">Abbreviated</option><option value="Full">Full</option></select> </td>

                <td width="46%" class="tablebody">                
                    <input type="text" name="{actionForm.vslName}" class="vsl" value="XING PING" maxlength="35" size="35">                  
                </td>

                <td rowspan="2" width="19%" class="tablebody"><input type="submit" value="Search"></td>
            </tr>                
            <tr>

                <td class="rowhead" width="20%" align="center">Voyage</td>

                <td align="right" width="15%" class="tablebody"><input type="hidden" name="wlw-select_key:{actionForm.inOutVoy}OldValue" value="true"><select name="wlw-select_key:{actionForm.inOutVoy}"><option value="OUT" selected="">OUT</option><option value="IN">IN</option></select></td>

                <td width="46%" class="tablebody"> 

                    <input type="text" name="{actionForm.voyage}" class="voy" value="" maxlength="17" size="17"> 

            </tr>
        </tbody></table>

                <table align="center" width="100%" cellpadding="3" cellspacing="0" border="0" bordercolor="#cccccc">                
                    <tbody><tr>
                        <td colspan="8">2. The following matching berthing record(s) are found. Click on the vessel to view the berthing details.</td>
                    </tr>
                </tbody></table>                     
                <table align="center" width="100%" cellpadding="3" cellspacing="0" border="1" bordercolor="#cccccc" class="altrows">
                    <tbody><tr class="row0">                                                
                        <td width="5%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Line<div>Shipping Line Code</div></a></small></td>
                        <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Service<div>PSA Service Route Code</div></a></small></td>
                        <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Vessel<div>Vessel Name</div></a></small></td>
                        <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Abbr Vessel<div>Abbreviated Vessel Name</div></a></small></td>                        
                        <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">In Voy<div>In Voyage</div></a></small></td> 
                        <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Out Voy<div>Out Voyage</div></a></small></td>                       
                        <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">VO<div>Vessel Operator</div></a></small></td>                        
                        <td width="14%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Berth Time<div>Berth Time</div></a></small></td>                        
                        <td width="14%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Unberth Time<div>Unberth Time</div></a></small></td>
                        <td width="5%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Berth<div>Berth No</div></a></small></td>
                        <td width="8%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Status<div>Berthing Status</div></a></small></td>

                        <!--end add-->                        
                    </tr>              
                    <tr class="row1">                                    
                        <td class="tablebody" align="left" nowrap=""><small>SUD</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>CC1</small></td>

                        <td class="tablebody" align="left" nowrap="">
                        <a href="/BASWeb/com/pn2/bas/web/BerthingEnquiry/getVslDetail.do?voyage=022&amp;bthgStatus=A&amp;vessel=XING+PING" target="rpc"><small>XING PING</small>

                        </td>
                        <td class="tablebody" align="left" nowrap=""><small>XP VESSEL</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>022</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>022</small></td>

                        <!-- code changed to show the agent desc for vessel operator-->                        
                        <td class="tablebody" align="left" nowrap=""><small>ABC S</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>&nbsp;+21-07-2019 18:00&nbsp;</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>&nbsp;+22-07-2019 12:00&nbsp;</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>CT</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>Active</small></td>
                        <!--end add-->                        
                    </tr>
                    <tr class="row0">                                              
                        <td class="tablebody" align="left" nowrap=""><small>SUD</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>CC1</small></td>

                        <td class="tablebody" align="left" nowrap="">
                        <a href="/BASWeb/com/pn2/bas/web/BerthingEnquiry/getVslDetail.do?voyage=021&amp;bthgStatus=A&amp;vessel=XING+PING" target="rpc"><small>XING PING</small>

                        </td>
                        <td class="tablebody" align="left" nowrap=""><small>XP VESSEL</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>021</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>021</small></td>                   
                        <td class="tablebody" align="left" nowrap=""><small>ABC S</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>&nbsp;+06-07-2019 18:00&nbsp;</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>&nbsp;+07-07-2019 12:00&nbsp;</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>CT</small></td>
                        <td class="tablebody" align="left" nowrap=""><small>Active</small></td>
                        <!--end add-->                      
                    </tr>  
            </tbody></table>
</form></body></html>

1 个答案:

答案 0 :(得分:0)

使用pandas库将表数据写入CSV文件

import requests
import pandas as pd

html = '''
        <html><head>
        <base href="https://www.portnet.com/BASWeb/com/pn2/bas/web/BerthingEnquiry/BerthingEnquiryPage.jsp">
        <title>Berthing Enquiry</title>
        <link href="/BASWeb/resources/css/styles.css" type="text/css" rel="stylesheet">
        <body bgcolor="#fafafa" onload="postRender(this);" style="margin:0px">

      <iframe id="rpc" name="rpc" style="width:0px; height:0px; border: 0px" src=""></iframe>

        <table cellspacing="0" cellpadding="5" border="0" width="100%" style="border-bottom:1px solid #cccccc"><tbody><tr>
        <td align="left" style="border:0px" valign="bottom"><div class="pagehead">Berthing Enquiry</div></td>
        <td align="right" style="border:0px" valign="bottom" nowrap=""><small>Zoom:<input type="text" name="pagezoom" value="100" maxlength="3" size="1" onchange="changeDocumentZoom(this.value)">%&nbsp;
        <a href="" onclick="window.focus(); window.print(); return false" target="rpc"><img src="/BASWeb/resources/images/print.gif" height="16" width="16" border="0">&nbsp;Print</a>
        &nbsp;&nbsp;&nbsp;14-06-2019 12:06:45 SGT
        </small></td></tr></tbody></table><br>

        <form name="enquireBerthingVesselForm" action="/BASWeb/com/pn2/bas/web/BerthingEnquiry/enquireBerthingVessel.do" method="post">    
        <table border="0" width="700" align="center">
    <tbody><tr valign="top">
    <td style="border:0px">     
            <table align="center" width="100%" cellpadding="3" cellspacing="0" border="0" bordercolor="#cccccc">       
                <tbody><tr>
                    </tr>
            </tbody></table>
            <table align="center" width="100%" cellpadding="3" cellspacing="0" border="1" bordercolor="#cccccc" class="table3d">
                <tbody><tr>
                    <td class="rowhead" width="20%" align="center">*Vessel</td>

                    <td align="right" width="15%" class="tablebody"> <input type="hidden" name="wlw-select_key:{actionForm.fullAbbr}OldValue" value="true"><select name="wlw-select_key:{actionForm.fullAbbr}"><option value="Abbreviated" selected="">Abbreviated</option><option value="Full">Full</option></select> </td>

                    <td width="46%" class="tablebody">                
                        <input type="text" name="{actionForm.vslName}" class="vsl" value="XING PING" maxlength="35" size="35">                  
                    </td>

                    <td rowspan="2" width="19%" class="tablebody"><input type="submit" value="Search"></td>
                </tr>                
                <tr>

                    <td class="rowhead" width="20%" align="center">Voyage</td>

                    <td align="right" width="15%" class="tablebody"><input type="hidden" name="wlw-select_key:{actionForm.inOutVoy}OldValue" value="true"><select name="wlw-select_key:{actionForm.inOutVoy}"><option value="OUT" selected="">OUT</option><option value="IN">IN</option></select></td>

                    <td width="46%" class="tablebody"> 

                        <input type="text" name="{actionForm.voyage}" class="voy" value="" maxlength="17" size="17"> 

                </tr>
            </tbody></table>

                    <table align="center" width="100%" cellpadding="3" cellspacing="0" border="0" bordercolor="#cccccc">                
                        <tbody><tr>
                            <td colspan="8">2. The following matching berthing record(s) are found. Click on the vessel to view the berthing details.</td>
                        </tr>
                    </tbody></table>                     
                    <table align="center" width="100%" cellpadding="3" cellspacing="0" border="1" bordercolor="#cccccc" class="altrows">
                        <tbody><tr class="row0">                                                
                            <td width="5%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Line<div>Shipping Line Code</div></a></small></td>
                            <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Service<div>PSA Service Route Code</div></a></small></td>
                            <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Vessel<div>Vessel Name</div></a></small></td>
                            <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Abbr Vessel<div>Abbreviated Vessel Name</div></a></small></td>                        
                            <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">In Voy<div>In Voyage</div></a></small></td> 
                            <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Out Voy<div>Out Voyage</div></a></small></td>                       
                            <td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">VO<div>Vessel Operator</div></a></small></td>                        
                            <td width="14%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Berth Time<div>Berth Time</div></a></small></td>                        
                            <td width="14%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Unberth Time<div>Unberth Time</div></a></small></td>
                            <td width="5%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Berth<div>Berth No</div></a></small></td>
                            <td width="8%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Status<div>Berthing Status</div></a></small></td>

                            <!--end add-->                        
                        </tr>              
                        <tr class="row1">                                    
                            <td class="tablebody" align="left" nowrap=""><small>SUD</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>CC1</small></td>

                            <td class="tablebody" align="left" nowrap="">
                            <a href="/BASWeb/com/pn2/bas/web/BerthingEnquiry/getVslDetail.do?voyage=022&amp;bthgStatus=A&amp;vessel=XING+PING" target="rpc"><small>XING PING</small>

                            </td>
                            <td class="tablebody" align="left" nowrap=""><small>XP VESSEL</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>022</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>022</small></td>

                            <!-- code changed to show the agent desc for vessel operator-->                        
                            <td class="tablebody" align="left" nowrap=""><small>ABC S</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>&nbsp;+21-07-2019 18:00&nbsp;</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>&nbsp;+22-07-2019 12:00&nbsp;</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>CT</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>Active</small></td>
                            <!--end add-->                        
                        </tr>
                        <tr class="row0">                                              
                            <td class="tablebody" align="left" nowrap=""><small>SUD</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>CC1</small></td>

                            <td class="tablebody" align="left" nowrap="">
                            <a href="/BASWeb/com/pn2/bas/web/BerthingEnquiry/getVslDetail.do?voyage=021&amp;bthgStatus=A&amp;vessel=XING+PING" target="rpc"><small>XING PING</small>

                            </td>
                            <td class="tablebody" align="left" nowrap=""><small>XP VESSEL</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>021</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>021</small></td>                   
                            <td class="tablebody" align="left" nowrap=""><small>ABC S</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>&nbsp;+06-07-2019 18:00&nbsp;</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>&nbsp;+07-07-2019 12:00&nbsp;</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>CT</small></td>
                            <td class="tablebody" align="left" nowrap=""><small>Active</small></td>
                            <!--end add-->                      
                        </tr>  
                </tbody></table>
    </form></body></html>'''

# fetch all page tables
'''
    if you are using requests library the try this

    response = requests.get(url).text
    tables = pd.read_html(response)

'''
tables = pd.read_html(html)

print(tables[4])
# write table data into `table_data` csv file
tables[4].to_csv("table_data")