如何将直接HTML数据提取到VBA

时间:2018-06-07 00:31:50

标签: javascript html vba web-scraping

<div class="r_title">
    <h1 data-securitycontent="name">Fidelity® Japan Smaller Companies</h1>
    <span class="gry">&nbsp;FJSCX</span>
    <span data-msat="span-securityInformation-star" class="r_star3"></span>

我如何从中提取r_star3? r_star3代表3颗星。到目前为止,我能够得到它的内部文本,但是星星是符号所以它的空白和r_star3似乎是它自己的类。我想将r_star3作为字符串提取并使用if语句来查看它有多少个星。什么都有帮助,谢谢。

编辑:

这是我到目前为止使用查询选择器,但querySelector打印出[object HTMLSpanElement]。我只粘贴在相关代码中。 This是星星所在的位置(右侧是星号)。

    .navigate "http://www.morningstar.com/funds/xnas/" & Range("A" & Row.Row).Value & "/quote.html"
    Do
    DoEvents
    Loop Until ie.readyState = READYSTATE_COMPLETE
    Dim doc As HTMLDocument
    Set doc = ie.document
    While ie.readyState <> 4
    Wend

    Application.Wait (Now + TimeValue("0:00:04"))

    Dim tblName As Object
    Dim span As Object

    On Error Resume Next





    'FIND THE STAR (Work in Progress)
    Set tblName = doc.getElementsByClassName("reports_nav")(0)
    Set span = tblName.getElementsByTagName("span").Item(1)


    Dim s As String, rating As Long
    s = doc.querySelector("span[class*=""r_star""]")
    MsgBox (s)

    rating = Replace(Split(Split(s, "class=" & Chr$(34))(1), Chr$(34))(0), "r_star", vbNullString)
    Range("C" & Row.Row).Value = rating
    MsgBox (rating)

1 个答案:

答案 0 :(得分:0)

您可以使用CSS选择器获取目标HTML,例如以下将获得有问题的元素:

span[data-msat="span-securityInformation-star"]

返回:

CSS query

解析结果:

然后,您可以从返回的元素解析OuterHTML以获得开始评级。

<强>代码:

Option Explicit
Public Sub Get_Information()
    Dim IE As New InternetExplorer

    With IE
        .Visible = True
        .navigate "http://www.morningstar.com/funds/xnas/seatx/quote.html"
        While .Busy = True Or .readyState < 4: DoEvents: Wend

        Dim a As Object, exitTime As Date
        exitTime = Now + TimeSerial(0, 0, 5)

        Do
            DoEvents
            On Error Resume Next
            Set a = .document.querySelector("span[data-msat=""span-securityInformation-star""]")   '<== Loop until time out checking if element has been found and set
            On Error GoTo 0
            If Now > exitTime Then Exit Do
        Loop While a Is Nothing

        If a Is Nothing Then Exit Sub

        Dim rating As Long
        rating = Replace(Split(Split(a.outerHTML, "class=" & Chr$(34))(1), Chr$(34))(0), "r_star", vbNullString)

        MsgBox rating

        .Quit
    End With
End Sub