为Excel中的每个单元格分配每个单元格

时间:2018-02-07 21:35:53

标签: excel vba excel-vba internet-explorer web-scraping

我正在尝试从网站中的表格中提取内容,该表格位于<div>元素内。目前,我可以成功登录,导航到特定页面并加载报告,但是,我很难从该表中提取数据。 我已经找到了几个来源,但遗憾的是,我们已经无法将其实现到脚本中。

这就是我目前所拥有的:

Option Explicit

Const MyUserID As String = "test123"
Const MyPassword As String = "test123"
Const READYSTATE_COMPLETE As Integer = 4
Dim objIE As Object
Dim sPageHTML  As String

Public Sub LoginScript()

    Set objIE = CreateObject("InternetExplorer.Application")
    With objIE
        .Visible = True
        .Silent = True
        .Navigate "https://www.mywebsite.com"
        Do Until .ReadyState = READYSTATE_COMPLETE
            DoEvents
        Loop
        Application.Wait Now() + TimeValue("00:00:02")
        .Document.all.txtuserid.Value = MyUserID
        .Document.all.txtPassword.Value = MyPassword
        .Document.getElementsByName("btnSubmit")(0).Click
        Do Until .ReadyState = READYSTATE_COMPLETE
            DoEvents
        Loop
        Application.Wait Now() + TimeValue("00:00:02")
        .Navigate "https://www.mywebsite.com/sample.html"
        Do Until .ReadyState = READYSTATE_COMPLETE
            DoEvents
        Loop
        .Document.getElementsByName("LocationID")(0).Value = "7"
        .Document.getElementsByName("endDay")(0).Value = "31"
        .Document.getElementsByName("endMonth")(0).Value = "12"
        .Document.getElementsByName("endYear")(0).Value = "2018"
        .Document.getElementsByName("view")(0).Click
        Do Until .ReadyState = READYSTATE_COMPLETE
            DoEvents
        Loop
        Application.Wait Now() + TimeValue("00:00:02")
        ThisWorkbook.Sheets("Sheet2").Activate
        Range("A1:K500").ClearContents
        sPageHTML = .Document.getElementByID("printOut").innerText
        ThisWorkbook.Sheets(2).Range("A1") = sPageHTML
        .Navigate "https://www.mywebsite.com/logout"
        Do Until .ReadyState = READYSTATE_COMPLETE
            DoEvents
        Loop
        .Quit
        Shell "RunDll32.exe InetCpl.Cpl, ClearMyTracksByProcess 11"
    End With

End Sub

我理解以下一行

ThisWorkbook.Sheets(2).Range("A1") = sPageHTML

将整个内容放入该单个单元格中。不幸的是,在试用几个选项时,我没有运气试图将每个TD分配给一个单元格。

ID&#34; printOut&#34;属于<div>元素。

以下是<div>元素中的HTML内容:

&#13;
&#13;
<div id="printOut">
    <table>
        <tbody>
            <tr>
                <td>
                    <table>
                        <tbody>
                            <tr>
                                <td></td>
                            </tr>
                        </tbody>
                    </table>
                </td>
            </tr>
        </tbody>
    </table><br>
    <table align="center" border="1" cellpadding="1" cellspacing="0">
        <tbody>
            <tr>
                <td>
                    <table align="center" bgcolor="#0080C0" border="0" cellpadding="0" cellspacing="0" width="100%">
                        <tbody>
                            <tr>
                                <td>
                                    <table cellpadding="0" cellspacing="0">
                                        <tbody>
                                            <tr>
                                                <td align="center" bgcolor="#0080C0" class="white" nowrap width="150">
                                                    Ventura<br>
                                                    <a class="red" href="javascript:WinOpenPopup('s_ReservationBufferAdd.asp?dir=add1&amp;goto=Unallocating&amp;CarSizeID=&amp;PickupLocationID=')" title="Book"><img border="0" src="images/b_Unallocat.jpg"></a>
                                                </td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td class="texunderline"><b>Available</b></td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td class="texunderline"><b>PickUp</b></td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td class="texunderline"><b>Dropoff</b></td>
                                            </tr>
                                        </tbody>
                                    </table>
                                </td>
                                <td>
                                    <table border="0" cellpadding="0" cellspacing="0">
                                        <tbody>
                                            <tr>
                                                <td align="right" bgcolor="#0080C0" class="white" nowrap>
                                                    <a class="sheetwhite" href="javascript:WinOpenPopup('s_bookingListDaily.asp?date=2018-02-07&amp;LocationID=7&amp;CategoryID=3')" title="Wednesday"><small><font color="#E6E600"></font></small><br>
                                                    7 Feb&nbsp;</a>
                                                </td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" bgcolor="#FF6262" class="white" width="50">0</td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" class="texunderline" width="50">0</td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" class="texunderline" width="50">0</td>
                                            </tr>
                                        </tbody>
                                    </table>
                                </td>
                                <td>
                                    <table border="0" cellpadding="0" cellspacing="0">
                                        <tbody>
                                            <tr>
                                                <td align="right" bgcolor="#0080C0" class="white" nowrap>
                                                    <a class="sheetwhite" href="javascript:WinOpenPopup('s_bookingListDaily.asp?date=2018-02-08&amp;LocationID=7&amp;CategoryID=3')" title="Thursday"><small><font color="#E6E600"></font></small><br>
                                                    8 Feb&nbsp;</a>
                                                </td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" bgcolor="#FF6262" class="white" width="50">0</td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" class="texunderline" width="50">0</td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" class="texunderline" width="50">0</td>
                                            </tr>
                                        </tbody>
                                    </table>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </td>
            </tr>
        </tbody>
    </table><br>
    <table align="center" border="1" cellpadding="1" cellspacing="0">
        <tbody>
            <tr>
                <td>
                    <table align="center" bgcolor="#0080C0" border="0" cellpadding="0" cellspacing="0" width="100%">
                        <tbody>
                            <tr>
                                <td>
                                    <table cellpadding="0" cellspacing="0">
                                        <tbody>
                                            <tr>
                                                <td align="center" bgcolor="#0080C0" class="white" nowrap width="150">
                                                    Mavericks<br>
                                                    <a class="red" href="javascript:WinOpenPopup('s_ReservationBufferAdd.asp?dir=add1&amp;goto=Unallocating&amp;CarSizeID=&amp;PickupLocationID=')" title="Book"><img border="0" src="images/b_Unallocat.jpg"></a>
                                                </td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td class="texunderline"><b>Available</b></td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td class="texunderline"><b>PickUp</b></td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td class="texunderline"><b>Dropoff</b></td>
                                            </tr>
                                        </tbody>
                                    </table>
                                </td>
                                <td>
                                    <table border="0" cellpadding="0" cellspacing="0">
                                        <tbody>
                                            <tr>
                                                <td align="right" bgcolor="#0080C0" class="white" nowrap>
                                                    <a class="sheetwhite" href="javascript:WinOpenPopup('s_bookingListDaily.asp?date=2018-02-07&amp;LocationID=7&amp;CategoryID=2')" title="Wednesday"><small><font color="#E6E600"></font></small><br>
                                                    7 Feb&nbsp;</a>
                                                </td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" class="texunderline" width="50">9</td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" class="texunderline" width="50">0</td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" class="texunderline" width="50">1</td>
                                            </tr>
                                        </tbody>
                                    </table>
                                </td>
                                <td>
                                    <table border="0" cellpadding="0" cellspacing="0">
                                        <tbody>
                                            <tr>
                                                <td align="right" bgcolor="#0080C0" class="white" nowrap>
                                                    <a class="sheetwhite" href="javascript:WinOpenPopup('s_bookingListDaily.asp?date=2018-02-08&amp;LocationID=7&amp;CategoryID=2')" title="Thursday"><small><font color="#E6E600"></font></small><br>
                                                    8 Feb&nbsp;</a>
                                                </td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" class="texunderline" width="50">10</td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" class="texunderline" width="50">0</td>
                                            </tr>
                                            <tr bgcolor="#FFFFFF">
                                                <td align="right" class="texunderline" width="50">2</td>
                                            </tr>
                                        </tbody>
                                    </table>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                    <form action="s_printToExcel.aspx" id="expExcel" method="post" name="expExcel" target="_blank">
                        <input id="htmlOut" name="htmlOut" type="hidden" value="">
                    </form>
                </td>
            </tr>
        </tbody>
    </table>
    <table align="center">
        <tbody>
            <tr>
                <td class="notetext">Note:</td>
                <td class="notetext"></td>
            </tr>
            <tr>
                <td class="notetext">1.</td>
                <td class="notetext">Today's Available = Yesterday's Available + Yesterday's dropoff (does not include same day drop offs) - Today's Pickup + NewFleetNo - OffFleetNo .</td>
            </tr>
            <tr>
                <td class="notetext">2.</td>
                <td align="left" class="notetext">The report includes unallocated bookings</td>
            </tr>
            <tr>
                <td class="notetext">3.</td>
                <td class="notetext">If you click on the date, you can see a listing of the pickups and drop offs for this day and location.</td>
            </tr>
            <tr>
                <td class="notetext">4.</td>
                <td class="notetext">Please make sure all bookings have correct status, i.e., hired, returned, and cancelled if not hired. Incorrect status may cause incorrect vehicle current location.</td>
            </tr>
        </tbody>
    </table><br>
    <p></p>
</div>
&#13;
&#13;
&#13;

我以为我会分享一个&#34;解决方案&#34;我的问题,部分有效。不太完美。

Set elemCollection = objIE.document.getElementByID("printOut").getElementsByTagName("TABLE")

    For t = 0 To (elemCollection.Length - 1)

        For r = 0 To (elemCollection(t).Rows.Length - 1)
            For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                ThisWorkbook.Worksheets(2).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
            Next c
        Next r
    Next t

这会在<div>元素中提取其中一个表格,奇怪的是,如果报告提供了两个,则不是两者都有。具有多行的内容被放置在单个单元格行中,然后我需要将其拆分,如下所示:

enter image description here

以下是网站上报告的示例图片。

enter image description here

1 个答案:

答案 0 :(得分:0)

您不应该使用div,而应使用table,然后针对每个tr和每个td进行循环。您可以尝试以下方式:

    Dim table as HTMLTable : Set table = .document.getElementsByTagName("table")(Number of the table tag, counting from 0)

    dim i as integer: i = 1 'Starting from row 1
    dim j as integer: j = 1 'Starting from column 1 ("A")
    For each tr in table.Children(0).Children 'Children(0) of table is ussualy a tbody, use Children(1) if the table also contains a thead tag
        For each td in tr.children
            Cells(i,j) = td.innertext
            j=j+1
        next td
        i=i+1
    next tr

如果您无法通过id访问该表格,请尝试classtagName.Children(i)上的div访问该表格以及随后的子元素。

编辑:我编辑了代码,因此除了表格标签的数量之外,它应该无需进一步编辑。只需将其粘贴到Range("A1:K500").ClearContents下面并发表评论即可。表标记的编号应该是通过手动计算HTML中按顺序显示的每个<table>标记获得的数字,从0开始。

Edit2:刚看到HTML。您可以直接使用.document.getElementsByTagName("table")(Number of the table tag, counting from 0)初始化.document.getElementByID("printOut").children(0)变量,而不是使用table.Children(i)方法返回标记的'i'th children元素,即嵌套在该特定标记中的'i'th标记。在这种情况下,div first(0)子节点将是该div中的第一个表标签。如果这是您要获得的表格,请将其保留为.Children(0)。如果没有,请尝试.Children(1).Children(2)以获取div中的其他表格。

Edit3:我刚刚发现我有一些代码几乎完全符合你的要求,除了它将HTML表格复制到2D数组而不是excel工作表。然后,您可以将数组复制到工作表,或修改代码以将表直接复制到工作表。

Dim MatrizCO() as variant
ReDim MatrizCO(0 To 60, 0 To 1)
Dim table As HTMLTable: Set table = 
objIE.document.getElementById("DetalleContainerTbl")
On Error GoTo SalirFor:
For i = 0 To 60
    MatrizCO(i, 0) = table.Children(1).Children(i).Children(1).innerText
    MatrizCO(i, 1) = table.Children(1).Children(i).Children(8).innerText
Next
SalirFor:

代码解释:首先,将2D数组定义为没有维度的变体,然后ReDim将其复制为需要的维度(行,列)(您还可以对维度进行硬编码)数组定义,但在我的情况下,我需要数组是动态的)。然后,如果每个不同情况下的表维度有所不同,则告诉程序在错误时退出for循环。然后循环遍历所有行i(在您的情况下,您还应循环遍历所有列j)。最后,您只需将数组中的每个i,j位置指向表中的i,j位置(此处,j的位置将是1和8作为。Children的参数)。同样,.Children参数的第一个table是1,因为在我的情况下,除了标签之外还有一个。如果没有,则其参数为0。