VBA - 如果第一行中的条件为真,则从HTMl表中获取innertext

时间:2017-01-03 11:19:16

标签: html excel vba screen-scraping

我是新手,需要为我的模块提供一些建议。 我创建了以下模块来从alexa.com获取具有特定地址的数据值: alexa.com/siteinfo/clashofclans.com

特定值嵌套在下表中: http://imgur.com/JB11PT2

我尝试获取数据"访客百分比"来自美国的上图,其值为9.1%,但该代码仅在美国处于第一排/第一位时才有效。 http://imgur.com/yMBmdbs

以下VBA代码是我尝试抓取的:

Sub ExtractAlexa()
  Dim tickername As String
  Dim doc As HTMLDocument

  ie.Visible = False
  ie.navigate "http://www.alexa.com/siteinfo/clashofclans.com"

  Do
    DoEvents
  Loop Until ie.readyState = READYSTATE_COMPLETE

  Application.Wait (Now + TimeValue("00:00:4"))
  Set doc = ie.document

  Set elems = doc.getElementById("demographics_div_country_table").getElementsByTagName("tr")
  For Each e In elems        
    If e.outerHTML Like "*/topsites/countries/US*" Then
      Sheet2.Range("E11").Value = Trim(doc.getElementsByTagName("td")(1).innerText)
    End If
  Next e

  ie.Quit
End Sub

拜托,有谁知道我在哪里错了? 谢谢。

2 个答案:

答案 0 :(得分:1)

使用CSS选择器组合,可以通过使用父id元素和带有td标签的子元素的子孙组合来获取表中的所有表单元格,从而实现相同的目的。循环直到找到美国字符串,并采用+ 1索引来获取%。

Option Explicit
Public Sub GetPercentage()
    Dim sResponse As String, html As HTMLDocument, tds As Object, i As Long

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.alexa.com/siteinfo/clashofclans.com", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With

    Set html = New HTMLDocument
    html.body.innerHTML = sResponse

    Set tds = html.querySelectorAll("#demographics_div_country_table td")
    For i = 0 To tds.Length - 1
        If Trim$(tds.item(i).innerText) = "United States" Then
            ThisWorkbook.Worksheets("Sheet2").Range("E11").Value = tds.item(i + 1).innerText
        Exit Sub
        End If
    Next i
End Sub

答案 1 :(得分:0)

你基本上已经钉了它 - 但我认为你在这一行中有一个微妙的错误:

Sheet2.Range("E11").Value = Trim(doc.getElementsByTagName("td")(1).innerText)

应该是:

Sheet2.Range("E11").Value = Trim(e.getElementsByTagName("td")(1).innerText)

此外,您可以使用WinHTTP来避免与IE进行一些争论:

Public Sub ExtractAlexa()
    Dim oHTML As MSHTML.HTMLDocument
    Dim elems As MSHTML.IHTMLElementCollection
    Dim e As MSHTML.IHTMLElement
    Set oHTML = New MSHTML.HTMLDocument
    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", "http://www.alexa.com/siteinfo/clashofclans.com", False
        .send
        oHTML.body.innerHTML = .responseText
    End With
    oHTML.getElementById("demographics_div_country_table").getElementsByTagName ("tr")
    Set elems = oHTML.getElementById("demographics_div_country_table").getElementsByTagName("tr")
    For Each e In elems
        If e.outerHTML Like "*/topsites/countries/US*" Then
            Sheet2.Range("E11").Value = Trim(e.getElementsByTagName("td")(1).innerText)
            Exit For
        End If
    Next e
End Sub