VBA HTML浏览清单

时间:2019-01-05 17:17:55

标签: html excel vba excel-vba web-scraping

我有以下工作代码,并在下面的网页中提取了列表的所有链接。我现在正在寻求扩展它,以获取下一页结果(最多n个)。我在执行此操作时遇到了麻烦(此代码的后半部分),但未显示任何内容。

注意:在此代码示例中,我试图将链接的第二页放置在B列中,但是在理想的情况下,我想将链接添加到页面1的结果底部(在A列。

更新:此代码现在移到每个页面结果,但是它在Col A中粘贴的链接与B和C中粘贴的链接相同,以此类推。我不确定这是怎么回事,因为我可以看到浏览器随即更改URL。

此外,如果您有其他更好的方法(而不是复制/粘贴10倍以获得所需的结果),请告诉我!

Option Explicit
Public Sub GetLinks()
Dim ie As New InternetExplorer, ws As Worksheet, t As Date
Dim k As Integer
Const MAX_WAIT_SEC As Long = 10
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
    .Visible = True
k = 0
Do While k < 10
    .Navigate2 "https://www.ebay.com/sch/i.html?_from=R40&_nkw=iPhone&_sacat=0&_ipg=200&_pgn=" & k

    While .Busy Or .readyState < 4: DoEvents: Wend

    Dim Links As Object, i As Long, count As Long
    t = Timer
    Do
        On Error Resume Next
        Set Links = .Document.querySelectorAll(".s-item__link[href]")
        count = Links.Length
        On Error GoTo 0
        If Timer - t > MAX_WAIT_SEC Then Exit Do
    Loop While count = 0
    For i = 0 To Links.Length - 1
        ws.Cells(i + 1, k + 1) = Links.item(i)
    Next
    k = k + 1
Loop

    .Quit
End With
End Sub

2 个答案:

答案 0 :(得分:3)

我可能希望添加一个测试,以确保您请求的页面数不超过可用页面数。稍微模块化代码以拉出信息提取步骤。使用数组和一些基本的优化(筛选更新)可以加快整个过程。此外,请尽快删除ie对象。

这将列表结果计数设置为200(实际上,使用给定的选择器,每页可得到211个结果)。不确定这仅仅是记住的eBay设置还是默认设置。

Option Explicit
Public Sub GetInfo()
    Dim ie As InternetExplorer, nodeList As Object, page As Long, totalResults As Long, ws As Worksheet
    Const RESULTS_PER_PAGE = 211
    Const DESIRED_PAGES = 3
    Const BASE = "https://www.ebay.com/sch/i.html?_from=R40&_nkw=iPhone&_sacat=0&_ipg=200&_pgn="
    Dim results(), url As String, maxPages As Long
    ReDim results(1 To DESIRED_PAGES)
    Application.ScreenUpdating = False
    Set ie = New InternetExplorer
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    With ie
        .Visible = True
        For page = 1 To DESIRED_PAGES
            url = BASE & page
            .Navigate2 url
            While .Busy Or .readyState < 4: DoEvents: Wend
            If page = 1 Then
                totalResults = Replace$(.document.querySelector(".srp-controls__count-heading").innerText, " results", vbNullString)
                maxPages = totalResults / RESULTS_PER_PAGE
            End If
            Set nodeList = .document.querySelectorAll("#srp-river-results .s-item__link[href]")
            results(page) = GetLinks(nodeList)
            Set nodeList = Nothing
            If page + 1 >= maxPages Then Exit For
        Next
        .Quit
    End With
    If maxPages < DESIRED_PAGES Then ReDim Preserve results(1 To maxPages)
    For page = LBound(results) To UBound(results)
        If page = 1 Then
            ws.Cells(1, 1).Resize(UBound(results(page), 1)) = Application.Transpose(results(page))
        Else
            ws.Cells(GetLastRow(ws, 1) + 1, 1).Resize(UBound(results(page), 1)) = Application.Transpose(results(page))
        End If
    Next
    Application.ScreenUpdating = True
End Sub

Public Function GetLinks(ByVal nodeList As Object) As Variant
    Dim results(), i As Long
    ReDim results(1 To nodeList.Length)
    For i = 0 To nodeList.Length - 1
        results(i + 1) = nodeList.item(i)
    Next
    GetLinks = results
End Function

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function

答案 1 :(得分:2)

未经测试(可能会丢失一些内容),但似乎您可以仅使用URL查询参数_pgn指定要访问的页面。

例如,导航到以下URL:

https://www.ebay.com/sch/i.html?_from=R40&_nkw=iPhone&_sacat=0&_ipg=200&_pgn=2

表示您正在请求第2页(同样,_ipg参数似乎决定了页面上显示多少个结果,因此增加到200可能意味着您总体上需要减少的请求)

因此,如果您在代码中创建一些变量pageNumber并在某种循环内递增(到达最后一页后终止),那么您应该可以获取所有页面-甚至任意索引处的任何页面-无需在代码中进行粘贴/重复操作。