我正在尝试从网站中的表中提取数据,已经在线检查了一些解决方案,但是我仍然无法获取任何数据。 之后,我想将整个表格提取到CSV文件中。
使用request.get获取html和bs4。
这是代码:
page = requests.get(driver.current_url)
soup = BeautifulSoup(page,'html.parser')
frames = soup.findAll('frame',{'name':'main'})
links = my_table.findAll('tr',{'class'})
已发布以下HTML,因为该网站需要登录。
<frameset rows="40,*,20" frameborder="0" border="0">
<frame name="nav" src="header.do" scrolling="no" noresize="">
<frame name="Main" src="menu.do">
<frame name="cr" src="cr.jsp" scrolling="no" noresize="">
<noframes><body></body></noframes>
</frameset>
<frame name="Main" src="menu.do">
<html><head>
<base href="https://www.portnet.com/BASWeb/com/pn2/bas/web/BerthingEnquiry/BerthingEnquiryPage.jsp">
<title>Berthing Enquiry</title>
<link href="/BASWeb/resources/css/styles.css" type="text/css" rel="stylesheet">
<body bgcolor="#fafafa" onload="postRender(this);" style="margin:0px">
<iframe id="rpc" name="rpc" style="width:0px; height:0px; border: 0px" src=""></iframe>
<table cellspacing="0" cellpadding="5" border="0" width="100%" style="border-bottom:1px solid #cccccc"><tbody><tr>
<td align="left" style="border:0px" valign="bottom"><div class="pagehead">Berthing Enquiry</div></td>
<td align="right" style="border:0px" valign="bottom" nowrap=""><small>Zoom:<input type="text" name="pagezoom" value="100" maxlength="3" size="1" onchange="changeDocumentZoom(this.value)">%
<a href="" onclick="window.focus(); window.print(); return false" target="rpc"><img src="/BASWeb/resources/images/print.gif" height="16" width="16" border="0"> Print</a>
14-06-2019 12:06:45 SGT
</small></td></tr></tbody></table><br>
<form name="enquireBerthingVesselForm" action="/BASWeb/com/pn2/bas/web/BerthingEnquiry/enquireBerthingVessel.do" method="post">
<table border="0" width="700" align="center">
<tbody><tr valign="top">
<td style="border:0px">
<table align="center" width="100%" cellpadding="3" cellspacing="0" border="0" bordercolor="#cccccc">
<tbody><tr>
</tr>
</tbody></table>
<table align="center" width="100%" cellpadding="3" cellspacing="0" border="1" bordercolor="#cccccc" class="table3d">
<tbody><tr>
<td class="rowhead" width="20%" align="center">*Vessel</td>
<td align="right" width="15%" class="tablebody"> <input type="hidden" name="wlw-select_key:{actionForm.fullAbbr}OldValue" value="true"><select name="wlw-select_key:{actionForm.fullAbbr}"><option value="Abbreviated" selected="">Abbreviated</option><option value="Full">Full</option></select> </td>
<td width="46%" class="tablebody">
<input type="text" name="{actionForm.vslName}" class="vsl" value="XING PING" maxlength="35" size="35">
</td>
<td rowspan="2" width="19%" class="tablebody"><input type="submit" value="Search"></td>
</tr>
<tr>
<td class="rowhead" width="20%" align="center">Voyage</td>
<td align="right" width="15%" class="tablebody"><input type="hidden" name="wlw-select_key:{actionForm.inOutVoy}OldValue" value="true"><select name="wlw-select_key:{actionForm.inOutVoy}"><option value="OUT" selected="">OUT</option><option value="IN">IN</option></select></td>
<td width="46%" class="tablebody">
<input type="text" name="{actionForm.voyage}" class="voy" value="" maxlength="17" size="17">
</tr>
</tbody></table>
<table align="center" width="100%" cellpadding="3" cellspacing="0" border="0" bordercolor="#cccccc">
<tbody><tr>
<td colspan="8">2. The following matching berthing record(s) are found. Click on the vessel to view the berthing details.</td>
</tr>
</tbody></table>
<table align="center" width="100%" cellpadding="3" cellspacing="0" border="1" bordercolor="#cccccc" class="altrows">
<tbody><tr class="row0">
<td width="5%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Line<div>Shipping Line Code</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Service<div>PSA Service Route Code</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Vessel<div>Vessel Name</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Abbr Vessel<div>Abbreviated Vessel Name</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">In Voy<div>In Voyage</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Out Voy<div>Out Voyage</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">VO<div>Vessel Operator</div></a></small></td>
<td width="14%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Berth Time<div>Berth Time</div></a></small></td>
<td width="14%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Unberth Time<div>Unberth Time</div></a></small></td>
<td width="5%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Berth<div>Berth No</div></a></small></td>
<td width="8%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Status<div>Berthing Status</div></a></small></td>
<!--end add-->
</tr>
<tr class="row1">
<td class="tablebody" align="left" nowrap=""><small>SUD</small></td>
<td class="tablebody" align="left" nowrap=""><small>CC1</small></td>
<td class="tablebody" align="left" nowrap="">
<a href="/BASWeb/com/pn2/bas/web/BerthingEnquiry/getVslDetail.do?voyage=022&bthgStatus=A&vessel=XING+PING" target="rpc"><small>XING PING</small>
</td>
<td class="tablebody" align="left" nowrap=""><small>XP VESSEL</small></td>
<td class="tablebody" align="left" nowrap=""><small>022</small></td>
<td class="tablebody" align="left" nowrap=""><small>022</small></td>
<!-- code changed to show the agent desc for vessel operator-->
<td class="tablebody" align="left" nowrap=""><small>ABC S</small></td>
<td class="tablebody" align="left" nowrap=""><small> +21-07-2019 18:00 </small></td>
<td class="tablebody" align="left" nowrap=""><small> +22-07-2019 12:00 </small></td>
<td class="tablebody" align="left" nowrap=""><small>CT</small></td>
<td class="tablebody" align="left" nowrap=""><small>Active</small></td>
<!--end add-->
</tr>
<tr class="row0">
<td class="tablebody" align="left" nowrap=""><small>SUD</small></td>
<td class="tablebody" align="left" nowrap=""><small>CC1</small></td>
<td class="tablebody" align="left" nowrap="">
<a href="/BASWeb/com/pn2/bas/web/BerthingEnquiry/getVslDetail.do?voyage=021&bthgStatus=A&vessel=XING+PING" target="rpc"><small>XING PING</small>
</td>
<td class="tablebody" align="left" nowrap=""><small>XP VESSEL</small></td>
<td class="tablebody" align="left" nowrap=""><small>021</small></td>
<td class="tablebody" align="left" nowrap=""><small>021</small></td>
<td class="tablebody" align="left" nowrap=""><small>ABC S</small></td>
<td class="tablebody" align="left" nowrap=""><small> +06-07-2019 18:00 </small></td>
<td class="tablebody" align="left" nowrap=""><small> +07-07-2019 12:00 </small></td>
<td class="tablebody" align="left" nowrap=""><small>CT</small></td>
<td class="tablebody" align="left" nowrap=""><small>Active</small></td>
<!--end add-->
</tr>
</tbody></table>
</form></body></html>
答案 0 :(得分:0)
使用pandas
库将表数据写入CSV文件
import requests
import pandas as pd
html = '''
<html><head>
<base href="https://www.portnet.com/BASWeb/com/pn2/bas/web/BerthingEnquiry/BerthingEnquiryPage.jsp">
<title>Berthing Enquiry</title>
<link href="/BASWeb/resources/css/styles.css" type="text/css" rel="stylesheet">
<body bgcolor="#fafafa" onload="postRender(this);" style="margin:0px">
<iframe id="rpc" name="rpc" style="width:0px; height:0px; border: 0px" src=""></iframe>
<table cellspacing="0" cellpadding="5" border="0" width="100%" style="border-bottom:1px solid #cccccc"><tbody><tr>
<td align="left" style="border:0px" valign="bottom"><div class="pagehead">Berthing Enquiry</div></td>
<td align="right" style="border:0px" valign="bottom" nowrap=""><small>Zoom:<input type="text" name="pagezoom" value="100" maxlength="3" size="1" onchange="changeDocumentZoom(this.value)">%
<a href="" onclick="window.focus(); window.print(); return false" target="rpc"><img src="/BASWeb/resources/images/print.gif" height="16" width="16" border="0"> Print</a>
14-06-2019 12:06:45 SGT
</small></td></tr></tbody></table><br>
<form name="enquireBerthingVesselForm" action="/BASWeb/com/pn2/bas/web/BerthingEnquiry/enquireBerthingVessel.do" method="post">
<table border="0" width="700" align="center">
<tbody><tr valign="top">
<td style="border:0px">
<table align="center" width="100%" cellpadding="3" cellspacing="0" border="0" bordercolor="#cccccc">
<tbody><tr>
</tr>
</tbody></table>
<table align="center" width="100%" cellpadding="3" cellspacing="0" border="1" bordercolor="#cccccc" class="table3d">
<tbody><tr>
<td class="rowhead" width="20%" align="center">*Vessel</td>
<td align="right" width="15%" class="tablebody"> <input type="hidden" name="wlw-select_key:{actionForm.fullAbbr}OldValue" value="true"><select name="wlw-select_key:{actionForm.fullAbbr}"><option value="Abbreviated" selected="">Abbreviated</option><option value="Full">Full</option></select> </td>
<td width="46%" class="tablebody">
<input type="text" name="{actionForm.vslName}" class="vsl" value="XING PING" maxlength="35" size="35">
</td>
<td rowspan="2" width="19%" class="tablebody"><input type="submit" value="Search"></td>
</tr>
<tr>
<td class="rowhead" width="20%" align="center">Voyage</td>
<td align="right" width="15%" class="tablebody"><input type="hidden" name="wlw-select_key:{actionForm.inOutVoy}OldValue" value="true"><select name="wlw-select_key:{actionForm.inOutVoy}"><option value="OUT" selected="">OUT</option><option value="IN">IN</option></select></td>
<td width="46%" class="tablebody">
<input type="text" name="{actionForm.voyage}" class="voy" value="" maxlength="17" size="17">
</tr>
</tbody></table>
<table align="center" width="100%" cellpadding="3" cellspacing="0" border="0" bordercolor="#cccccc">
<tbody><tr>
<td colspan="8">2. The following matching berthing record(s) are found. Click on the vessel to view the berthing details.</td>
</tr>
</tbody></table>
<table align="center" width="100%" cellpadding="3" cellspacing="0" border="1" bordercolor="#cccccc" class="altrows">
<tbody><tr class="row0">
<td width="5%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Line<div>Shipping Line Code</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Service<div>PSA Service Route Code</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Vessel<div>Vessel Name</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Abbr Vessel<div>Abbreviated Vessel Name</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">In Voy<div>In Voyage</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Out Voy<div>Out Voyage</div></a></small></td>
<td width="9%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">VO<div>Vessel Operator</div></a></small></td>
<td width="14%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Berth Time<div>Berth Time</div></a></small></td>
<td width="14%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Unberth Time<div>Unberth Time</div></a></small></td>
<td width="5%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Berth<div>Berth No</div></a></small></td>
<td width="8%" valign="middle" class="tablehead" nowrap=""><small><a class="tooltip" href="" onclick="return false;" target="rpc">Status<div>Berthing Status</div></a></small></td>
<!--end add-->
</tr>
<tr class="row1">
<td class="tablebody" align="left" nowrap=""><small>SUD</small></td>
<td class="tablebody" align="left" nowrap=""><small>CC1</small></td>
<td class="tablebody" align="left" nowrap="">
<a href="/BASWeb/com/pn2/bas/web/BerthingEnquiry/getVslDetail.do?voyage=022&bthgStatus=A&vessel=XING+PING" target="rpc"><small>XING PING</small>
</td>
<td class="tablebody" align="left" nowrap=""><small>XP VESSEL</small></td>
<td class="tablebody" align="left" nowrap=""><small>022</small></td>
<td class="tablebody" align="left" nowrap=""><small>022</small></td>
<!-- code changed to show the agent desc for vessel operator-->
<td class="tablebody" align="left" nowrap=""><small>ABC S</small></td>
<td class="tablebody" align="left" nowrap=""><small> +21-07-2019 18:00 </small></td>
<td class="tablebody" align="left" nowrap=""><small> +22-07-2019 12:00 </small></td>
<td class="tablebody" align="left" nowrap=""><small>CT</small></td>
<td class="tablebody" align="left" nowrap=""><small>Active</small></td>
<!--end add-->
</tr>
<tr class="row0">
<td class="tablebody" align="left" nowrap=""><small>SUD</small></td>
<td class="tablebody" align="left" nowrap=""><small>CC1</small></td>
<td class="tablebody" align="left" nowrap="">
<a href="/BASWeb/com/pn2/bas/web/BerthingEnquiry/getVslDetail.do?voyage=021&bthgStatus=A&vessel=XING+PING" target="rpc"><small>XING PING</small>
</td>
<td class="tablebody" align="left" nowrap=""><small>XP VESSEL</small></td>
<td class="tablebody" align="left" nowrap=""><small>021</small></td>
<td class="tablebody" align="left" nowrap=""><small>021</small></td>
<td class="tablebody" align="left" nowrap=""><small>ABC S</small></td>
<td class="tablebody" align="left" nowrap=""><small> +06-07-2019 18:00 </small></td>
<td class="tablebody" align="left" nowrap=""><small> +07-07-2019 12:00 </small></td>
<td class="tablebody" align="left" nowrap=""><small>CT</small></td>
<td class="tablebody" align="left" nowrap=""><small>Active</small></td>
<!--end add-->
</tr>
</tbody></table>
</form></body></html>'''
# fetch all page tables
'''
if you are using requests library the try this
response = requests.get(url).text
tables = pd.read_html(response)
'''
tables = pd.read_html(html)
print(tables[4])
# write table data into `table_data` csv file
tables[4].to_csv("table_data")