通过VBA对网站检查元素进行HTML解析

时间:2018-04-26 06:54:16

标签: html vba

我需要提取一些网站的图像src数据。我写了一些代码,但它没有用。

Sub getSrcAttributeImgTag()
    Dim ie As InternetExplorer
    Dim html As HTMLDocument
    Dim ElementCol As Object, Link As Object
    Dim ecol As Long

    Application.ScreenUpdating = False

    Set ie = New InternetExplorer
    ie.Visible = False

    ie.navigate "http://test.site/showadv.php?rstr=0.8809384451371399"

    Do While ie.readyState <> READYSTATE_COMPLETE

    Loop

    Set html = ie.document
    Set ElementCol = html.getElementsByTagName("img")

    For Each Link In ElementCol
        ecol = Worksheets("sheet1").Cells(1, Columns.Count).End(xlToLeft).Offset(0, 1).Column
        Cells(1, ecol).Value = Link.src
        Cells(1, ecol).Columns.AutoFit
    Next

    Set ie = Nothing
    Application.StatusBar = ""
    Application.ScreenUpdating = True
End Sub

HTML如下

  <form name="mainf" onsubmit="javascript:dosub();return false;">
  <div align="center">
  <table>
  <tbody><tr>
  <td><div id="cimg1"><img width="35" height="55" src="images/capchs/4.png"> 
  </div></td>
  <td><div id="cimg2"><img width="35" height="55" src="images/capchs/3.png"> 
  </div></td>
  <td><div id="cimg3"><img width="35" height="55" src="images/capchs/3.png"> 
   </div></td>
  <td><div id="cimg4"><img width="35" height="55" src="images/capchs/5.png"> 
   </div></td>
  <td><div id="cimg5">&nbsp;</div></td>
 </tr>
</tbody></table>
</div><br>
<div align="center">
 <input type="text" name="capcha" value="" 
style="width:200px;height:30px;font-size:15px;">
</div><br><br>
<div align="center">
<input type="button" value="CONTINUE" style="width:200px;height:30px;font- 
size:15px;" onclick="javascript:dosub()">
</div><br><br>
<div align="center">
<br>
 </div><br><br>
 </form>

但我的VBA代码不适用于检查元素HTML。

结果应该是excel A =“images / capchs / 5.png”等

提前致谢

2 个答案:

答案 0 :(得分:0)

您可以简单地解析响应文本吗?

Option Explicit

Sub test()
    Dim a As String
    a = Range("A1").Text '<==this would actually be your response HTML
    Dim arr() As String, i As Long
    arr = Split(a, "src=")

    For i = 1 To UBound(arr)
        Debug.Print Replace(Split(arr(i), ">")(0), Chr(34), vbNullString)
    Next i
End Sub

可能类似于:

Public Sub getSrcAttributeImgTag()
    Const ecol As Long = 1
    Dim responseString As String, i As Long, arr()

    With CreateObject("MSXML2.serverXMLHTTP")

        .Open "GET", "http://test.site/showadv.php?rstr=0.8809384451371399", False
        .send
         responseString = .responseText
    End With

    arr = Split(responseString, "src=")

    For i = 1 To UBound(arr)
        ThisWorkbook.ActiveSheet.Cells(i, ecol) = Replace(Split(arr(i), ">")(0), Chr(34), vbNullString)
        ThisWorkbook.ActiveSheet.Cells(i, ecol).Columns.AutoFit
    Next i

End Sub

我假设等待不是一个大问题但是如果它们是那么你可以循环直到页面加载,你似乎知道该怎么做。此外,您还可以使用Application.Wait Now + TimeSerial(h,m,s)

引入等待

很难说只有上面的内容,但你可能会得到一些与

相关的东西

类似于document.querySelectorAll("div img")

答案 1 :(得分:0)

您应该能够非常轻松地检查TD元素。这是一个通用的解决方案。只需针对您特定的,无法访问的方案进行自定义。

Option Explicit
Sub Web_Table_Option_One()
    Dim xml    As Object
    Dim html   As Object
    Dim objTable As Object
    Dim result As String
    Dim lRow As Long
    Dim lngTable As Long
    Dim lngRow As Long
    Dim lngCol As Long
    Dim ActRw As Long
    Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
    With xml
        .Open "GET", "http://www.ewrinc.com/cotton/contentPublic/reports/stateReceipts.aspx", False
        .send
    End With
    result = xml.responseText
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = result
    Set objTable = html.getElementsByTagName("Table")
    For lngTable = 0 To objTable.Length - 1
        For lngRow = 0 To objTable(lngTable).Rows.Length - 1
            For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
                ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
            Next lngCol
        Next lngRow
        ActRw = ActRw + objTable(lngTable).Rows.Length + 1
    Next lngTable
End Sub