我正在尝试从开始页面到此Webpage
的所有表格使用下面的代码我可以抓取第1页的表格内容,但我不知道如何修改代码以从开始页面到结束获取数据。
Option Explicit
Sub NBAStats()
Dim IE As Object, obj As Object
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("http://stats.nba.com/league/player/#!/")
While IE.ReadyState <> 4
DoEvents
Wend
Do While IE.busy: DoEvents: Loop
ThisWorkbook.Sheet1.Clear
Set elemCollection = IE.Document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
答案 0 :(得分:1)
尝试找到您正在抓取的网站的Sitemap.xml。 sitemap.xml填充包含网页中的所有链接。
将该xml文件导入Excel工作表,阅读每个链接&amp;获取其中的每个表。
答案 1 :(得分:1)
首先,在我看来,Internet Explorer的VBA自动化非常不稳定,在生产用例中并不实际可行。这也意味着从网站中抓取数据只是为了在浏览器中查看而在生产用例中实际上是不切实际的。如果您有权使用这些数据,那么您应该要求提供其他数据源(例如XML
或JSON
)。如果您没有权利,那么您不应该这样做。该网站的提供者可能不同意这一点。
要说清楚,我正在谈论像这样的网站,它只提供JavaScript
的数据。如果数据在HTML
之内,那么您可以通过XMLHTTP
获取这些数据。这是另一回事。
然而,我将提供一个&#34;解决方案&#34;。所以你不能简单地思考&#34;他根本无法做到这一点,所以他说你不应该这样做。&#34;
因此,您必须分析网站并选择可以点击导航的元素。
Option Explicit
Sub NBAStats()
Dim IE As Object
Dim r As Long, c As Long, t As Long, rSheet As Long, rStart As Long
Dim bReady As Boolean
Dim elementsTable As Object
Dim elementsPageNavRigth As Object
Dim elemPageNavRigth As Object
Dim elementsTableDiv As Object
ThisWorkbook.Worksheets(1).Cells.Clear
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("http://stats.nba.com/league/player/#!/")
Do While IE.busy
DoEvents
Loop
rSheet = 0
Do
Do While elementsTableDiv Is Nothing
Set elementsTableDiv = IE.Document.getElementsByClassName("table-responsive")
DoEvents
Loop
Do While elementsTableDiv(0) Is Nothing
DoEvents
Loop
Set elementsPageNavRigth = IE.Document.getElementsByClassName("page-nav right")
Set elemPageNavRigth = elementsPageNavRigth(0)
If elemPageNavRigth.className = "page-nav right disabled" Then bReady = True
'If rSheet = 0 Then rStart = 0 Else rStart = 1
Set elementsTable = elementsTableDiv(0).getElementsByTagName("TABLE")
For r = rStart To (elementsTable(0).Rows.Length - 1)
For c = 0 To (elementsTable(0).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + rSheet + 1, c + 1) = elementsTable(t).Rows(r).Cells(c).innerText
Next c
Next r
rSheet = rSheet + r
If Not elemPageNavRigth Is Nothing Then elemPageNavRigth.Click
Set elementsTableDiv = Nothing
Loop Until bReady Or elemPageNavRigth Is Nothing
End With
Set IE = Nothing
End Sub