一切都完成后,我的刮刀抛出错误而不是退出浏览器

时间:2017-12-27 13:30:03

标签: vba excel-vba web-scraping internet-explorer-11 excel

我在vba中编写了一个刮刀来解析一些来自torrent网站的电影信息。我使用IEqueryselector来完成任务。当我执行我的代码时,它会解析所有内容以及弹出错误。似乎错误出现了,而不是继续。如果我取消错误框,那么我可以看到结果。我在下面上传了两张图片,向您展示我遇到的错误。如何在没有任何错误的情况下成功执行代码?提前谢谢。

以下是完整代码:

Sub Torrent_Data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .Document
    End With

    For Each post In html.querySelectorAll(".browse-movie-bottom")
        Row = Row + 1: Cells(Row, 1) = post.queryselector(".browse-movie-title").innerText
        Cells(Row, 2) = post.queryselector(".browse-movie-year").innerText
    Next post
    IE.Quit
End Sub

我遇到的错误:

First error

Second error

两个错误同时出现。 我正在使用Internet Explorer 11。

另一方面,如果我尝试如下,它会成功地带来结果,没有任何问题。

Sub Torrent_Data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .Document
    End With

    For Each post In html.getElementsByClassName("browse-movie-bottom")
        Row = Row + 1: Cells(Row, 1) = post.queryselector(".browse-movie-title").innerText
        Cells(Row, 2) = post.queryselector(".browse-movie-year").innerText
    Next post
    IE.Quit
End Sub

参考我已添加到库中:

1. Microsoft Internet Controls
2. Microsoft HTML Object Library

那么,queryselector或者我在这里错过了什么来成功?是否有任何参考添加到库以摆脱错误?

2 个答案:

答案 0 :(得分:5)

好的,所以对该网页有一些严重不友好的事情。它对我来说一直在崩溃。所以我已经在脚本引擎/脚本控制中运行了一个javascript程序,它可以工作。

我希望你能遵循它。逻辑是在添加到ScriptEngine的javascript中。我得到两个节点列表,一个电影列表和一个年份列表;然后我逐步遍历每个数组,并将它们作为键值对添加到Microsoft Scripting Dictionary。

Option Explicit

'*Tools->References
'*    Microsoft Scripting Runtime
'*    Microsoft Scripting Control
'*    Microsoft Internet Controls
'*    Microsoft HTML Object Library

Sub Torrent_Data()
    Dim row As Long
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = True
        .navigate "https://yts.am/browse-movies"
        Do While .readyState <> READYSTATE_COMPLETE:
            DoEvents
        Loop
        Set html = .document
    End With

    Dim dicFilms As Scripting.Dictionary
    Set dicFilms = New Scripting.Dictionary

    Call GetScriptEngine.Run("getMovies", html, dicFilms)

    Dim vFilms As Variant
    vFilms = dicFilms.Keys

    Dim vYears As Variant
    vYears = dicFilms.Items

    Dim lRowLoop As Long
    For lRowLoop = 0 To dicFilms.Count - 1

        Cells(lRowLoop + 1, 1) = vFilms(lRowLoop)
        Cells(lRowLoop + 1, 2) = vYears(lRowLoop)

    Next lRowLoop

    Stop

    IE.Quit
End Sub

Private Function GetScriptEngine() As ScriptControl
    '* see code from this SO Q & A
    ' https://stackoverflow.com/questions/37711073/in-excel-vba-on-windows-how-to-get-stringified-json-respresentation-instead-of
    Static soScriptEngine As ScriptControl
    If soScriptEngine Is Nothing Then
        Set soScriptEngine = New ScriptControl
        soScriptEngine.Language = "JScript"

        soScriptEngine.AddCode "function getMovies(htmlDocument, microsoftDict) { " & _
                                    "var titles = htmlDocument.querySelectorAll('a.browse-movie-title'), i;" & _
                                    "var years = htmlDocument.querySelectorAll('div.browse-movie-year'), j;" & _
                                    "if ( years.length === years.length) {" & _
                                    "for (i=0; i< years.length; ++i) {" & _
                                    "   var film = titles[i].innerText;" & _
                                    "   var year = years[i].innerText;" & _
                                    "   microsoftDict.Add(film, year);" & _
                                    "}}}"

    End If
    Set GetScriptEngine = soScriptEngine
End Function

答案 1 :(得分:1)

该网站有API。检查e。 G。来自网址https://yts.am/api/v2/list_movies.json?page=1&limit=50的结果,实际上代表了来自最新电影类别第一页的50部电影,采用JSON格式。

看看下面的例子。 JSON.bas模块导入VBA项目以进行JSON处理。

Option Explicit

Sub Test()

    Dim sJSONString As String
    Dim vJSON
    Dim sState As String
    Dim lPage As Long
    Dim aRes()
    Dim i As Long
    Dim aData()
    Dim aHeader()

    With Sheets(1)
        .Cells.Delete
        .Cells.WrapText = False
    End With
    lPage = 1
    aRes = Array()
    Do
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", "https://yts.am/api/v2/list_movies.json?page=" & lPage & "&limit=50", False
            .send
            sJSONString = .responseText
        End With
        JSON.Parse sJSONString, vJSON, sState
        If Not vJSON("data").Exists("movies") Then Exit Do
        vJSON = vJSON("data")("movies")
        ReDim Preserve aRes(UBound(aRes) + UBound(vJSON) + 1)
        For i = 0 To UBound(vJSON)
            Set aRes(UBound(aRes) - UBound(vJSON) + i) = vJSON(i)
        Next
        lPage = lPage + 1
        Debug.Print "Parsed " & (UBound(aRes) + 1)
        DoEvents
    Loop
    JSON.ToArray aRes, aData, aHeader
    With Sheets(1)
        OutputArray .Cells(1, 1), aHeader
        Output2DArray .Cells(2, 1), aData
        .Columns.AutoFit
    End With
    MsgBox "Completed"

End Sub

Sub OutputArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

我的输出如下,目前共有7182部电影:

output

顺便说一句,类似的方法适用于以下答案:1234567891011121314和{{ 3}}