如何使用VBA从网站中提取嵌套div中的信息

时间:2019-06-19 10:52:29

标签: excel vba

我想从div元素内的表中抓取信息。

当id不是div元素时,我尝试并成功设法提取了信息。当我尝试获取div元素的ID时,它显示:

  

错误13:类型不匹配

Sub Test1()
    Dim IE As Object

    Set IE = CreateObject("InternetExplorer.Application")

    IE.Visible = True
    IE.navigate "http://www.concorindia.com/containerquery.aspx"

    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop

    Set Doc = IE.document

    IE.document.getElementById("contno").Value = ThisWorkbook.Sheets("Status").Range("B3").Value
    Doc.getElementById("CONTButton1").Click       

    Set Data = Doc.getElementById("PPosition")

End Sub

我打算首先获取div id“ PPosition”内的所有数据来提取div元素内的信息,但消息框显示错误13:类型不匹配。

有人可以帮我获取上述代码表中的信息吗,例如火车号码,发车状态等。

样品容器编号-TCNU4171692

代码中还提到了打算从中废弃数据的网站。 (http://www.concorindia.com/containerquery.aspx

1 个答案:

答案 0 :(得分:1)

这是将整个HTML表打印输出到工作表中的一般方法:

<!DOCTYPE html>
<html>

<head>
  <script data-require="angular.js@1.6.6" data-semver="1.6.6" src="https://ajax.googleapis.com/ajax/libs/angularjs/1.6.6/angular.min.js"></script>
  <script src="https://angular-ui.github.io/bootstrap/ui-bootstrap-tpls-0.6.0.js" type="text/javascript"></script>
</head>

<body ng-app="myApp">
  <div ng-controller="test">
    <table>
        <tr>
          <th>column 1</th>
          <th>column 2</th>
          <th>column 3</th>
        </tr>
        <tr data-ng-repeat="x in new">
          <td>
            <h1>{{x[0]}}</h1>
          </td>
          <td>
            <h1>{{x[1]}}</h1>
          </td>
          <td>
            <h1>{{x[2]}}</h1>
          </td>
        </tr>
    </table>
  </div>
</body>

</html>

使用的参考:Sub ScrapeContainerInfo() Dim req As New WinHttpRequest Dim doc As New HTMLDocument Dim div As HTMLDivElement Dim table As HTMLTable Dim tableRow As HTMLTableRow Dim tableCell As HTMLTableCell Dim sht As Worksheet Dim i As Long, j As Long Dim url As String, containerNumber As String, reqBody As String Set sht = ThisWorkbook.Worksheets("Sheet2") containerNumber = "TCNU4171692" url = "http://www.concorindia.com/containerquery.aspx" reqBody = "__VIEWSTATE=%2FwEPDwULLTE1Njk0Mzk4MzkPZBYCAgoPZBYEAgEPDxYCHgdWaXNpYmxlaGRkAgMPZBYEAgMPEGRkFgFmZAIFDw9kFgIeB29uY2xpY2sFIWphdmFzY3JpcHQ6ZXJyPXRlc3QoKTtyZXR1cm4gZXJyO2RkS1KgJsS2Kb22YOy%2FEN0XTBRc8lY%3D&__EVENTVALIDATION=%2FwEWBgKk%2BrO6AwKhk42ICgKmqIGHDAKbyfWzBQLvyamyBQKxlra5AfFIxQQ%2BvdUNsDciaOk4g0LyycSG&contno=" & containerNumber & "&drpimpexp=Any&CONTButton1=Submit+Query" With req .Open "POST", url, False .setRequestHeader "Content-Type", "application/x-www-form-urlencoded" .send reqBody doc.body.innerHTML = .responseText End With Set div = doc.getElementById("PPosition") Set table = div.getElementsByTagName("table")(0) i = 1 For Each tableRow In table.Rows i = i + 1 j = 1 For Each tableCell In tableRow.Cells j = j + 1 sht.Cells(i, j) = tableCell.innerText Next tableCell Next tableRow End Sub Microsoft HTML Object Library

输出如下:

enter image description here

现在,如果您想以更有针对性的方式访问表的信息,可以这样:

Micrsoft WinHTTP Services Version 5.1

上面的代码在立即窗口中打印表第二行的第一单元格。您可以相应地对其进行修改以访问任何单元格,请记住索引从Debug.Print table.Rows(1).Cells(0).innerText 开始。

编辑

我错误地认为获取实际的HTML响应不是问题,但是由于显然如此,因此我更新了代码以包括需要发送的HTTP请求。我尽量避免使用IE。

我已经硬编码了一个特定的容器号。可以轻松地对其进行修改以遍历多个容器编号。