如何从下表格式VBA中抓取数据

时间:2016-01-10 08:31:43

标签: excel vba excel-vba web-scraping

我正在尝试从开始页面到此Webpage

的所有表格

使用下面的代码我可以抓取第1页的表格内容,但我不知道如何修改代码以从开始页面到结束获取数据。

Option Explicit
Sub NBAStats()
 Dim IE As Object, obj As Object
 Dim r As Integer, c As Integer, t As Integer
 Dim elemCollection As Object
 Set IE = CreateObject("InternetExplorer.Application")

 With IE
 .Visible = True
 .navigate ("http://stats.nba.com/league/player/#!/")
While IE.ReadyState <> 4
DoEvents
Wend
 Do While IE.busy: DoEvents: Loop
 ThisWorkbook.Sheet1.Clear

 Set elemCollection = IE.Document.getElementsByTagName("TABLE")

    For t = 0 To (elemCollection.Length - 1)

        For r = 0 To (elemCollection(t).Rows.Length - 1)
            For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
            Next c
        Next r
    Next t

 End With
 Set IE = Nothing
 End Sub

2 个答案:

答案 0 :(得分:1)

尝试找到您正在抓取的网站的Sitemap.xml。 sitemap.xml填充包含网页中的所有链接。

将该xml文件导入Excel工作表,阅读每个链接&amp;获取其中的每个表。

答案 1 :(得分:1)

首先,在我看来,Internet Explorer的VBA自动化非常不稳定,在生产用例中并不实际可行。这也意味着从网站中抓取数据只是为了在浏览器中查看而在生产用例中实际上是不切实际的。如果您有权使用这些数据,那么您应该要求提供其他数据源(例如XMLJSON)。如果您没有权利,那么您不应该这样做。该网站的提供者可能不同意这一点。

要说清楚,我正在谈论像这样的网站,它只提供JavaScript的数据。如果数据在HTML之内,那么您可以通过XMLHTTP获取这些数据。这是另一回事。

然而,我将提供一个&#34;解决方案&#34;。所以你不能简单地思考&#34;他根本无法做到这一点,所以他说你不应该这样做。&#34;

因此,您必须分析网站并选择可以点击导航的元素。

Option Explicit
Sub NBAStats()
 Dim IE As Object
 Dim r As Long, c As Long, t As Long, rSheet As Long, rStart As Long
 Dim bReady As Boolean
 Dim elementsTable As Object
 Dim elementsPageNavRigth As Object
 Dim elemPageNavRigth As Object
 Dim elementsTableDiv As Object

 ThisWorkbook.Worksheets(1).Cells.Clear

 Set IE = CreateObject("InternetExplorer.Application")

 With IE
  .Visible = True
  .navigate ("http://stats.nba.com/league/player/#!/")
  Do While IE.busy
   DoEvents
  Loop

  rSheet = 0

  Do
   Do While elementsTableDiv Is Nothing
    Set elementsTableDiv = IE.Document.getElementsByClassName("table-responsive")
    DoEvents
   Loop

   Do While elementsTableDiv(0) Is Nothing
    DoEvents
   Loop

   Set elementsPageNavRigth = IE.Document.getElementsByClassName("page-nav right")
   Set elemPageNavRigth = elementsPageNavRigth(0)

   If elemPageNavRigth.className = "page-nav right disabled" Then bReady = True

   'If rSheet = 0 Then rStart = 0 Else rStart = 1

   Set elementsTable = elementsTableDiv(0).getElementsByTagName("TABLE")
   For r = rStart To (elementsTable(0).Rows.Length - 1)
    For c = 0 To (elementsTable(0).Rows(r).Cells.Length - 1)
     ThisWorkbook.Worksheets(1).Cells(r + rSheet + 1, c + 1) = elementsTable(t).Rows(r).Cells(c).innerText
    Next c
   Next r

   rSheet = rSheet + r

   If Not elemPageNavRigth Is Nothing Then elemPageNavRigth.Click

   Set elementsTableDiv = Nothing

  Loop Until bReady Or elemPageNavRigth Is Nothing

 End With
 Set IE = Nothing
End Sub