无法从网页中获取标题

时间:2018-07-29 11:25:15

标签: vba excel-vba web-scraping internet-explorer-11

我已经在vba中结合IE编写了一个脚本,以从网页中获取不同titles的{​​{1}},但是我无法。看来我已经使用了正确的charts名称和class名称来到达内容但没有骰子。它也不会引发任何错误。

这是我到目前为止的方法:

tag

标题如下所示,在每个图表上方可见:

Sub GetTitle()
    Const Url As String = "https://www.fbatoolkit.com/"
    Dim IE As New InternetExplorer, Html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = True
        .navigate Url
        While .Busy = True Or .readyState < 4: DoEvents: Wend
        Set Html = .document
    End With

    Application.Wait Now + TimeValue("00:00:05")

    For Each post In Html.getElementsByClassName("chart")
        With post.getElementsByTagName("text")
          If .Length Then R = R + 1: Cells(R, 1) = .item(0).innerText
        End With
    Next post
End Sub

我不希望有任何与Toys & Games Health & Household 相关的解决方案。谢谢。

2 个答案:

答案 0 :(得分:1)

说实话,这有点作弊。在我想找到一种更好的方法之前,请考虑将其视为占位符,因为我猜想您特别想访问 那些 标题。

Option Explicit
Public Sub GetInfo()
    Dim ie As New InternetExplorer, html As HTMLDocument, titles(), i As Long
    With ie
        .Visible = True
        .navigate "https://www.fbatoolkit.com/"
        While .Busy Or .readyState < 4: DoEvents: Wend
        Set html = .document
        titles = GetTitles(html.body.innerHTML, "id=""visualization([^""]*)")
        For i = LBound(titles) To UBound(titles)
            Debug.Print titles(i)
        Next
        .Quit '<== Remember to quit application
    End With
End Sub

Public Function GetTitles(ByVal inputString As String, ByVal sPattern As String) As Variant
    Dim Matches As Object, iMatch As Object, s As String, arrMatches(), i As Long
    With CreateObject("vbscript.regexp")
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = sPattern
        If .test(inputString) Then
            Set Matches = .Execute(inputString)
            For Each iMatch In Matches
                If iMatch.SubMatches(0) <> vbNullString Then
                    ReDim Preserve arrMatches(i)
                    arrMatches(i) = Replace$(Replace$(iMatch.SubMatches(0), Chr$(95), Chr$(32)), Chr$(32) & Chr$(32), Chr$(32) & Chr$(38) & Chr$(32))
                    i = i + 1
                End If
            Next iMatch
        End If
    End With
    GetTitles = arrMatches
End Function

答案 1 :(得分:1)

虽然这个答案完全受QHarr的影响,但我想将其发布给以后的读者。在这里最好使用IDS。以下解决方案几乎类似于类别名称。

这里是:

Sub GetChartInfo()
    Const Url As String = "https://www.fbatoolkit.com/"
    Dim IE As New InternetExplorer, Html As HTMLDocument
    Dim itemvisibility As Object, otitle As Object, I&

    With IE
        .Visible = False
        .navigate Url
        While .Busy = True Or .readyState < 4: DoEvents: Wend
        Set Html = .document
    End With

    Do: Set itemvisibility = Html.querySelectorAll("div[class='chart-container']"): DoEvents: Loop While itemvisibility.Length <= -1

    With Html.querySelectorAll("div[class='chart-container']")
        For I = 0 To .Length - 1
            Do: Set otitle = .Item(I).querySelector(".chart"): DoEvents: Loop While otitle Is Nothing
            Cells(I + 1, 1) = Application.WorksheetFunction.Proper(Replace(Replace(Split(otitle.getAttribute("id"), "visualization_")(1), "__", " "), "_", " "))
        Next I
    End With
End Sub