使用vba解析div之后

时间:2014-02-03 22:26:45

标签: vba html

我想从网页中提取信息。我有以下例子。

    <div class="class1"> 
      <div class="class2">Address</div>
         address1<br>
         address2<br>    
         Zipcode, City, Country    <br>
      <div class="class2">phone</div>
         +1 352 555 555     <br>
  <div class="class2">Mobile</div >
      0563 555 31 56         
      <div  class="class2">Email</div>
         email@provider.com<br>
     </div>

我想获得以下内容:

  • 地址1
  • 地址2
  • 邮编,城市,国家
  • +1 352 555 555
  • 0563 555 31 56
  • email@provider.com

我有以下代码:

    Sub GetData()
    Dim oHtml       As HTMLDocument
    Dim oElement    As Object
    Dim i           As Integer
    Set oHtml = New HTMLDocument

    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
       .Open "GET", "http://www.example.com", False
       .send
       oHtml.body.innerHTML = .responseText
   End With

   i = 1
   For Each oElement In oHtml.getElementsByClassName("class1")
       Debug.Print i, oElement.outerText    
       i = i + 1

   Next oElement
   End Sub

感谢任何帮助!

谢谢你,蒂娜

1 个答案:

答案 0 :(得分:0)

快速又脏,但有效:

Sub GetData()
    Dim oHtml       As HTMLDocument
    Dim resultText As String
    Set oHtml = New HTMLDocument

    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", "http://www.example.com", False
        .send
        oHtml.body.innerHTML = .responseText
    End With

    'get inner text of div
    resultText = oHtml.getElementsByClassName("class1")(0).innerText
    'delete headers
    resultText = Replace(resultText, "Address", "")
    resultText = Replace(resultText, "phone", "")
    resultText = Replace(resultText, "Mobile", "")
    resultText = Replace(resultText, "Email", "")
    'delete linebreaks
    resultText = Replace(resultText, vbCrLf & vbCrLf, vbCrLf)
    resultText = Replace(resultText, vbCrLf & vbCrLf, vbCrLf)

    Debug.Print resultText
End Sub