从一个单元格中提取innerText

时间:2013-07-16 11:01:51

标签: excel vba excel-vba html-parsing

我试图只提取HTML表格中最右边单元格的内部文本。这是HTML代码的一小部分。该行包含810个单元格,TR标记包含811个TD标记:

</tr><tr align="center" id="spt_inner_row_2"><td nowrap="nowrap" bgcolor="#EEEEEE" style="border-bottom: 1px solid white; border-right: 1px solid white">
&nbsp;300 - 305&nbsp;
</td><td nowrap="nowrap" bgcolor="#EEEEEE" style="border-bottom: 1px solid white; border-right: 1px solid white">
&nbsp;300 - 305&nbsp;
</td><td nowrap="nowrap" bgcolor="#EEEEEE" style="border-bottom: 1px solid white; border-right: 1px solid white">
&nbsp;300 - 305&nbsp;
</td><td nowrap="nowrap" bgcolor="#EEEEEE" style="border-bottom: 1px solid white; border-right: 1px solid white">
&nbsp;300 - 305&nbsp;

我目前使用的代码成功地从每个单元格中提取数据并将其粘贴到活动工作表的A列中:

Sub GetData()

    Dim URL As String
    Dim IE As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim TDelements As IHTMLElementCollection
    Dim TDelement As HTMLTableCell
    Dim r As Long

    'For login use
    Dim LoginForm As HTMLFormElement
    Dim UserNameInputBox As HTMLInputElement
    Dim PasswordInputBox As HTMLInputElement

    URL = "https://www.whatever.com"

    Set IE = New InternetExplorer

    With IE
        .navigate URL
        .Visible = True

        'Wait for page to load
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend

        Set HTMLdoc = .document

            'Enter login info
            Set LoginForm = HTMLdoc.forms(0)

            'Username
            Set UserNameInputBox = LoginForm.elements("username")
            UserNameInputBox.Value = "username"

            'Password
            Set PasswordInputBox = LoginForm.elements("password")
            PasswordInputBox.Value = "password"

            'Get the form input button and click it

            Set SignInButton = LoginForm.elements("doLogin")
            SignInButton.Click

            'Wait for the new page to load

            Do While IE.readyState <> READYSTATE_COMPLETE Or IE.Busy: DoEvents: Loop

        'Auto-navigate to start page, so we need to navigate once more

        .navigate URL

        Do While IE.readyState <> READYSTATE_COMPLETE Or IE.Busy: DoEvents: Loop

        End With


    'Specify how to recognize data to extract
    Set TDelements = HTMLdoc.getElementById("spt_inner_row_2").getElementsByTagName("TD")


    r = 0

    For Each TDelement In TDelements

        ActiveSheet.Range("A1").Offset(r, 0).Value = TDelement.innerText

        r = r + 1

    Next

End Sub

我真正需要的是只提取HTML表格行中的最后一个(最右边)单元格。有什么建议吗?

1 个答案:

答案 0 :(得分:0)

IHTMLElementCollection具有length属性和item属性。 item属性可以采用数字索引,但是从零开始,因此最后一个条目位于length - 1

Dim TDelements As IHTMLElementCollection

Set TDelements = HTMLdoc.getElementById("spt_inner_row_2").getElementsByTagName("TD")

With TDelements
    MsgBox .Item(.Length - 1).InnerText
End With