我有以下工作代码,并在下面的网页中提取了列表的所有链接。我现在正在寻求扩展它,以获取下一页结果(最多n个)。我在执行此操作时遇到了麻烦(此代码的后半部分),但未显示任何内容。
注意:在此代码示例中,我试图将链接的第二页放置在B列中,但是在理想的情况下,我想将链接添加到页面1的结果底部(在A列。
更新:此代码现在移到每个页面结果,但是它在Col A中粘贴的链接与B和C中粘贴的链接相同,以此类推。我不确定这是怎么回事,因为我可以看到浏览器随即更改URL。
此外,如果您有其他更好的方法(而不是复制/粘贴10倍以获得所需的结果),请告诉我!
Option Explicit
Public Sub GetLinks()
Dim ie As New InternetExplorer, ws As Worksheet, t As Date
Dim k As Integer
Const MAX_WAIT_SEC As Long = 10
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
k = 0
Do While k < 10
.Navigate2 "https://www.ebay.com/sch/i.html?_from=R40&_nkw=iPhone&_sacat=0&_ipg=200&_pgn=" & k
While .Busy Or .readyState < 4: DoEvents: Wend
Dim Links As Object, i As Long, count As Long
t = Timer
Do
On Error Resume Next
Set Links = .Document.querySelectorAll(".s-item__link[href]")
count = Links.Length
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While count = 0
For i = 0 To Links.Length - 1
ws.Cells(i + 1, k + 1) = Links.item(i)
Next
k = k + 1
Loop
.Quit
End With
End Sub
答案 0 :(得分:3)
我可能希望添加一个测试,以确保您请求的页面数不超过可用页面数。稍微模块化代码以拉出信息提取步骤。使用数组和一些基本的优化(筛选更新)可以加快整个过程。此外,请尽快删除ie对象。
这将列表结果计数设置为200(实际上,使用给定的选择器,每页可得到211个结果)。不确定这仅仅是记住的eBay设置还是默认设置。
Option Explicit
Public Sub GetInfo()
Dim ie As InternetExplorer, nodeList As Object, page As Long, totalResults As Long, ws As Worksheet
Const RESULTS_PER_PAGE = 211
Const DESIRED_PAGES = 3
Const BASE = "https://www.ebay.com/sch/i.html?_from=R40&_nkw=iPhone&_sacat=0&_ipg=200&_pgn="
Dim results(), url As String, maxPages As Long
ReDim results(1 To DESIRED_PAGES)
Application.ScreenUpdating = False
Set ie = New InternetExplorer
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
For page = 1 To DESIRED_PAGES
url = BASE & page
.Navigate2 url
While .Busy Or .readyState < 4: DoEvents: Wend
If page = 1 Then
totalResults = Replace$(.document.querySelector(".srp-controls__count-heading").innerText, " results", vbNullString)
maxPages = totalResults / RESULTS_PER_PAGE
End If
Set nodeList = .document.querySelectorAll("#srp-river-results .s-item__link[href]")
results(page) = GetLinks(nodeList)
Set nodeList = Nothing
If page + 1 >= maxPages Then Exit For
Next
.Quit
End With
If maxPages < DESIRED_PAGES Then ReDim Preserve results(1 To maxPages)
For page = LBound(results) To UBound(results)
If page = 1 Then
ws.Cells(1, 1).Resize(UBound(results(page), 1)) = Application.Transpose(results(page))
Else
ws.Cells(GetLastRow(ws, 1) + 1, 1).Resize(UBound(results(page), 1)) = Application.Transpose(results(page))
End If
Next
Application.ScreenUpdating = True
End Sub
Public Function GetLinks(ByVal nodeList As Object) As Variant
Dim results(), i As Long
ReDim results(1 To nodeList.Length)
For i = 0 To nodeList.Length - 1
results(i + 1) = nodeList.item(i)
Next
GetLinks = results
End Function
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
答案 1 :(得分:2)
未经测试(可能会丢失一些内容),但似乎您可以仅使用URL查询参数_pgn
指定要访问的页面。
例如,导航到以下URL:
https://www.ebay.com/sch/i.html?_from=R40&_nkw=iPhone&_sacat=0&_ipg=200&_pgn=2
表示您正在请求第2页(同样,_ipg
参数似乎决定了页面上显示多少个结果,因此增加到200
可能意味着您总体上需要减少的请求)
因此,如果您在代码中创建一些变量pageNumber
并在某种循环内递增(到达最后一页后终止),那么您应该可以获取所有页面-甚至任意索引处的任何页面-无需在代码中进行粘贴/重复操作。