在vba中没有id的HTML

时间:2016-12-22 19:57:27

标签: html excel vba web-scraping

我正在尝试从网站获取月初至今和年初至今的返回值 http://us.spindices.com/indices/equity/sp-oil-gas-exploration-production-select-industry-index 使用VBA进入Excel电子表格。问题是页面代码中没有“id =”,我理解这会使这个过程变得简单得多。还有一个时间段(年初至今或月初至今)可见的问题,但我现在很乐意只抓取MTD值。

这是我的代码:

Sub Get_Change()

'attempting to scrape Barclay's website

Dim appIE As Object
Dim MyVar As String


Set appIE = CreateObject("internetexplorer.application")

With appIE
    .Navigate "http://us.spindices.com/indices/equity/sp-oil-gas-exploration-production-select-industry-index"
    .Visible = True
End With

Do While appIE.Busy
    DoEvents
    Range("A1").Value = "Working..."
Loop
Set TDelements = appIE.document.getElementsbyClassName("performance-chart-table")

For Each TDelement In TDelements
    If TDelement.class = "change" Then
        MyVar = TDelement.class.innerText("Value")

    End If
Next
Range("A1").Value = MyVar
appIE.Quit
Set appIE = Nothing


End Sub

如果我能找到将'MyVar'变量设置为当前MTD或YTD值的方法,我将会完成,但我很难过,因为这些值中没有一个唯一的标识符。有什么想法吗?

1 个答案:

答案 0 :(得分:0)

我最近看了一些CSS培训视频,我可以告诉你CSS选择器语法很强大,我推荐它。这与使用JQuery时javascript / web开发人员用于选择元素的语法相同。

我认为你应该尝试使用

document.queryselectorall

或者在您的情况下,因为您已经深入到文档中以获取该字段queryselectorall的“性能图表”调用TDelements

http://www.w3schools.com/jsref/met_document_queryselectorall.asp

上的文档

并提供一个CSS选择器字符串作为参数,其语法可以在http://www.w3schools.com/cssref/css_selectors.asp

找到

我已经离开并为你做了......

Sub Get_Change()
    '* Tools-References Microsoft HTML Object Library

    'attempting to scrape Barclay's website

    Dim appIE As Object
    Dim MyVar As String


    Set appIE = CreateObject("internetexplorer.application")

    With appIE
        .Navigate "http://us.spindices.com/indices/equity/sp-oil-gas-exploration-production-select-industry-index"
        .Visible = True
    End With

    Do While appIE.Busy
        DoEvents
        Range("A1").Value = "Working..."
    Loop
    Dim htmlDoc As MSHTML.HTMLDocument
    Set htmlDoc = appIE.document

    Dim TDelements2 As MSHTML.IHTMLElementCollection
    Set TDelements2 = htmlDoc.getElementsByClassName("performance-chart-table")
    While TDelements2.Length < 1

        DoEvents
        Application.Wait (Now() + TimeSerial(0, 0, 3))
        Set TDelements2 = htmlDoc.getElementsByClassName("performance-chart-table")

    Wend        

    Dim oHTMLTablePerformanceChartTable As MSHTML.HTMLTable
    Set oHTMLTablePerformanceChartTable = TDelements2.Item(0)


    Dim objChangeCollection As MSHTML.IHTMLDOMChildrenCollection
    Set objChangeCollection = oHTMLTablePerformanceChartTable.querySelectorAll(".change")

    'Debug.Assert objChangeCollection.Length = 2

    Dim objChange2 As Object
    Set objChange2 = objChangeCollection.Item(1)


    MyVar = objChange2.innerText




    'Set TDelements = appIE.document.getElementsByClassName("performance-chart-table")
    '
    'For Each TDelement In TDelements
    '    TDelements.querySelectorAll (".change")
    '    If TDelement.class = "change" Then
    '        MyVar = TDelement.class.innerText("Value")
    '
    '    End If
    'Next
    Range("A1").Value = MyVar
    appIE.Quit
    Set appIE = Nothing


End Sub