在excel中提取网页的特定表部分?

时间:2016-10-27 17:38:58

标签: excel vba excel-vba excel-web-query

我是Excel VBA / Macro的新手

我需要抓住页面的特定部分,而不是整页。 波纹管代码在完整的页面中工作,但不需要页面的所有部分。

Sub GrabOutStandingTable()

With ActiveSheet.QueryTables.Add(Connection:= _
    "URL;http://dsebd.org/displayCompany.php?name=ABBANK", Destination:=Range( _
    "$A$1"))
    .CommandType = 0
    .Name = "displayCompany.php?name=ABBANK"
    .FieldNames = True
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = True
    .RefreshOnFileOpen = False
    .BackgroundQuery = True
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = True
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .WebSelectionType = xlSpecifiedTables
    .WebFormatting = xlWebFormattingNone
    .WebTables = """company"""
    .WebPreFormattedTextToColumns = True
    .WebConsecutiveDelimitersAsOne = True
    .WebSingleBlockTextImport = False
    .WebDisableDateRecognition = False
    .WebDisableRedirections = False
    .Refresh BackgroundQuery:=False
End With
Sheets.Add After:=ActiveSheet
End Sub

表格部分为" 公司其他信息"在页面的下半部分,这就是我所说的。宏应该提取这部分。

2 个答案:

答案 0 :(得分:0)

旧数据>由于网站的结构方式,Web不会处理这个问题。您需要的数据深深嵌套到其他表中,并由几个表组成。

建议使用Power Query(不需要VBA)。以下是如何在XL2013中使用Power Query。使用Excel的功能区并找到标签POWER QUERY。

  1. 使用菜单选项:POWER QUERY>来自网络
  2. 从Web对话框显示。输入您的网址。
  3. 点击确定
  4. 您需要的数据在表30中。找到并单击它,然后单击“加载”。
  5. 如果我们有XL2016(办公室365),我们已经有Power Query。如果我们有XL2010或XL2013,我们可以从https://www.microsoft.com/en-us/download/details.aspx?id=39379&CorrelationId=1441491e-917e-43de-8d6a-21f98287c3c2

    下载

答案 1 :(得分:0)

XHR请求:

如果您通过(非唯一的)company ID来收集元素,则可以进行更快的无浏览器XHR请求,只需定位目标表(位于位置23)即可。

我使用querySelectorAll方法来获取匹配的节点,然后在索引23处提取表。

请注意随后在代码输出中显示的其他赞助者信息。


网页视图:

page


示例代码输出:

Sheet view


代码:

Option Explicit
Public Sub GetTable()
    Dim sResponse As String, hTable As Object, HTML As New HTMLDocument
    Application.ScreenUpdating = False
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://dsebd.org/displayCompany.php?name=ABBANK", False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With

    sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
        With HTML
            .body.innerHTML = sResponse
            Set hTable = .querySelectorAll("#company")(23)
        End With
       WriteTable hTable
       Application.ScreenUpdating = True
End Sub

Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)

    If ws Is Nothing Then Set ws = ActiveSheet

    Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, R As Long, C As Long, tBody As Object
    R = startRow
    With ws
        Dim headers As Object, header As Object, columnCounter As Long
        Set headers = hTable.getElementsByTagName("th")
        For Each header In headers
            columnCounter = columnCounter + 1
            .Cells(startRow, columnCounter) = header.innerText
        Next header
        startRow = startRow + 1
        Set tBody = hTable.getElementsByTagName("tbody")
        For Each tSection In tBody               'HTMLTableSection
            Set tRow = tSection.getElementsByTagName("tr") 'HTMLTableRow
            For Each tr In tRow
                R = R + 1
                Set tCell = tr.getElementsByTagName("td")
                C = 1
                For Each td In tCell             'DispHTMLElementCollection
                    .Cells(R, C).Value = td.innerText 'HTMLTableCell
                    C = C + 1
                Next td
            Next tr
        Next tSection
    End With
End Sub

参考:

VBE>工具>参考> HTML对象库