我正在尝试从网站中的表格中提取内容,该表格位于<div>
元素内。目前,我可以成功登录,导航到特定页面并加载报告,但是,我很难从该表中提取数据。
我已经找到了几个来源,但遗憾的是,我们已经无法将其实现到脚本中。
这就是我目前所拥有的:
Option Explicit
Const MyUserID As String = "test123"
Const MyPassword As String = "test123"
Const READYSTATE_COMPLETE As Integer = 4
Dim objIE As Object
Dim sPageHTML As String
Public Sub LoginScript()
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Visible = True
.Silent = True
.Navigate "https://www.mywebsite.com"
Do Until .ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
Application.Wait Now() + TimeValue("00:00:02")
.Document.all.txtuserid.Value = MyUserID
.Document.all.txtPassword.Value = MyPassword
.Document.getElementsByName("btnSubmit")(0).Click
Do Until .ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
Application.Wait Now() + TimeValue("00:00:02")
.Navigate "https://www.mywebsite.com/sample.html"
Do Until .ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
.Document.getElementsByName("LocationID")(0).Value = "7"
.Document.getElementsByName("endDay")(0).Value = "31"
.Document.getElementsByName("endMonth")(0).Value = "12"
.Document.getElementsByName("endYear")(0).Value = "2018"
.Document.getElementsByName("view")(0).Click
Do Until .ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
Application.Wait Now() + TimeValue("00:00:02")
ThisWorkbook.Sheets("Sheet2").Activate
Range("A1:K500").ClearContents
sPageHTML = .Document.getElementByID("printOut").innerText
ThisWorkbook.Sheets(2).Range("A1") = sPageHTML
.Navigate "https://www.mywebsite.com/logout"
Do Until .ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
.Quit
Shell "RunDll32.exe InetCpl.Cpl, ClearMyTracksByProcess 11"
End With
End Sub
我理解以下一行
ThisWorkbook.Sheets(2).Range("A1") = sPageHTML
将整个内容放入该单个单元格中。不幸的是,在试用几个选项时,我没有运气试图将每个TD分配给一个单元格。
ID&#34; printOut&#34;属于<div>
元素。
以下是<div>
元素中的HTML内容:
<div id="printOut">
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table><br>
<table align="center" border="1" cellpadding="1" cellspacing="0">
<tbody>
<tr>
<td>
<table align="center" bgcolor="#0080C0" border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td>
<table cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td align="center" bgcolor="#0080C0" class="white" nowrap width="150">
Ventura<br>
<a class="red" href="javascript:WinOpenPopup('s_ReservationBufferAdd.asp?dir=add1&goto=Unallocating&CarSizeID=&PickupLocationID=')" title="Book"><img border="0" src="images/b_Unallocat.jpg"></a>
</td>
</tr>
<tr bgcolor="#FFFFFF">
<td class="texunderline"><b>Available</b></td>
</tr>
<tr bgcolor="#FFFFFF">
<td class="texunderline"><b>PickUp</b></td>
</tr>
<tr bgcolor="#FFFFFF">
<td class="texunderline"><b>Dropoff</b></td>
</tr>
</tbody>
</table>
</td>
<td>
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td align="right" bgcolor="#0080C0" class="white" nowrap>
<a class="sheetwhite" href="javascript:WinOpenPopup('s_bookingListDaily.asp?date=2018-02-07&LocationID=7&CategoryID=3')" title="Wednesday"><small><font color="#E6E600"></font></small><br>
7 Feb </a>
</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" bgcolor="#FF6262" class="white" width="50">0</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" class="texunderline" width="50">0</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" class="texunderline" width="50">0</td>
</tr>
</tbody>
</table>
</td>
<td>
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td align="right" bgcolor="#0080C0" class="white" nowrap>
<a class="sheetwhite" href="javascript:WinOpenPopup('s_bookingListDaily.asp?date=2018-02-08&LocationID=7&CategoryID=3')" title="Thursday"><small><font color="#E6E600"></font></small><br>
8 Feb </a>
</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" bgcolor="#FF6262" class="white" width="50">0</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" class="texunderline" width="50">0</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" class="texunderline" width="50">0</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table><br>
<table align="center" border="1" cellpadding="1" cellspacing="0">
<tbody>
<tr>
<td>
<table align="center" bgcolor="#0080C0" border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td>
<table cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td align="center" bgcolor="#0080C0" class="white" nowrap width="150">
Mavericks<br>
<a class="red" href="javascript:WinOpenPopup('s_ReservationBufferAdd.asp?dir=add1&goto=Unallocating&CarSizeID=&PickupLocationID=')" title="Book"><img border="0" src="images/b_Unallocat.jpg"></a>
</td>
</tr>
<tr bgcolor="#FFFFFF">
<td class="texunderline"><b>Available</b></td>
</tr>
<tr bgcolor="#FFFFFF">
<td class="texunderline"><b>PickUp</b></td>
</tr>
<tr bgcolor="#FFFFFF">
<td class="texunderline"><b>Dropoff</b></td>
</tr>
</tbody>
</table>
</td>
<td>
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td align="right" bgcolor="#0080C0" class="white" nowrap>
<a class="sheetwhite" href="javascript:WinOpenPopup('s_bookingListDaily.asp?date=2018-02-07&LocationID=7&CategoryID=2')" title="Wednesday"><small><font color="#E6E600"></font></small><br>
7 Feb </a>
</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" class="texunderline" width="50">9</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" class="texunderline" width="50">0</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" class="texunderline" width="50">1</td>
</tr>
</tbody>
</table>
</td>
<td>
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td align="right" bgcolor="#0080C0" class="white" nowrap>
<a class="sheetwhite" href="javascript:WinOpenPopup('s_bookingListDaily.asp?date=2018-02-08&LocationID=7&CategoryID=2')" title="Thursday"><small><font color="#E6E600"></font></small><br>
8 Feb </a>
</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" class="texunderline" width="50">10</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" class="texunderline" width="50">0</td>
</tr>
<tr bgcolor="#FFFFFF">
<td align="right" class="texunderline" width="50">2</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<form action="s_printToExcel.aspx" id="expExcel" method="post" name="expExcel" target="_blank">
<input id="htmlOut" name="htmlOut" type="hidden" value="">
</form>
</td>
</tr>
</tbody>
</table>
<table align="center">
<tbody>
<tr>
<td class="notetext">Note:</td>
<td class="notetext"></td>
</tr>
<tr>
<td class="notetext">1.</td>
<td class="notetext">Today's Available = Yesterday's Available + Yesterday's dropoff (does not include same day drop offs) - Today's Pickup + NewFleetNo - OffFleetNo .</td>
</tr>
<tr>
<td class="notetext">2.</td>
<td align="left" class="notetext">The report includes unallocated bookings</td>
</tr>
<tr>
<td class="notetext">3.</td>
<td class="notetext">If you click on the date, you can see a listing of the pickups and drop offs for this day and location.</td>
</tr>
<tr>
<td class="notetext">4.</td>
<td class="notetext">Please make sure all bookings have correct status, i.e., hired, returned, and cancelled if not hired. Incorrect status may cause incorrect vehicle current location.</td>
</tr>
</tbody>
</table><br>
<p></p>
</div>
&#13;
我以为我会分享一个&#34;解决方案&#34;我的问题,部分有效。不太完美。
Set elemCollection = objIE.document.getElementByID("printOut").getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(2).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
这会在<div>
元素中提取其中一个表格,奇怪的是,如果报告提供了两个,则不是两者都有。具有多行的内容被放置在单个单元格行中,然后我需要将其拆分,如下所示:
以下是网站上报告的示例图片。
答案 0 :(得分:0)
您不应该使用div
,而应使用table
,然后针对每个tr
和每个td
进行循环。您可以尝试以下方式:
Dim table as HTMLTable : Set table = .document.getElementsByTagName("table")(Number of the table tag, counting from 0)
dim i as integer: i = 1 'Starting from row 1
dim j as integer: j = 1 'Starting from column 1 ("A")
For each tr in table.Children(0).Children 'Children(0) of table is ussualy a tbody, use Children(1) if the table also contains a thead tag
For each td in tr.children
Cells(i,j) = td.innertext
j=j+1
next td
i=i+1
next tr
如果您无法通过id
访问该表格,请尝试class
或tagName
或.Children(i)
上的div
访问该表格以及随后的子元素。
编辑:我编辑了代码,因此除了表格标签的数量之外,它应该无需进一步编辑。只需将其粘贴到Range("A1:K500").ClearContents
下面并发表评论即可。表标记的编号应该是通过手动计算HTML中按顺序显示的每个<table>
标记获得的数字,从0开始。
Edit2:刚看到HTML。您可以直接使用.document.getElementsByTagName("table")(Number of the table tag, counting from 0)
初始化.document.getElementByID("printOut").children(0)
变量,而不是使用table
。 .Children(i)
方法返回标记的'i'th children元素,即嵌套在该特定标记中的'i'th标记。在这种情况下,div first(0)子节点将是该div中的第一个表标签。如果这是您要获得的表格,请将其保留为.Children(0)
。如果没有,请尝试.Children(1)
或.Children(2)
以获取div中的其他表格。
Edit3:我刚刚发现我有一些代码几乎完全符合你的要求,除了它将HTML表格复制到2D数组而不是excel工作表。然后,您可以将数组复制到工作表,或修改代码以将表直接复制到工作表。
Dim MatrizCO() as variant
ReDim MatrizCO(0 To 60, 0 To 1)
Dim table As HTMLTable: Set table =
objIE.document.getElementById("DetalleContainerTbl")
On Error GoTo SalirFor:
For i = 0 To 60
MatrizCO(i, 0) = table.Children(1).Children(i).Children(1).innerText
MatrizCO(i, 1) = table.Children(1).Children(i).Children(8).innerText
Next
SalirFor:
代码解释:首先,将2D数组定义为没有维度的变体,然后ReDim
将其复制为需要的维度(行,列)(您还可以对维度进行硬编码)数组定义,但在我的情况下,我需要数组是动态的)。然后,如果每个不同情况下的表维度有所不同,则告诉程序在错误时退出for循环。然后循环遍历所有行i(在您的情况下,您还应循环遍历所有列j)。最后,您只需将数组中的每个i,j位置指向表中的i,j位置(此处,j的位置将是1和8作为。Children
的参数)。同样,.Children
参数的第一个table
是1,因为在我的情况下,除了标签之外还有一个。如果没有,则其参数为0。