如何使用vba从Datastore.prime中抓取数据

时间:2015-12-25 14:42:11

标签: javascript excel vba excel-vba web-scraping

我正在尝试从这个网站抓取数据http://www.whoscored.com/regions/252/tournaments/2/england-premier-league

当我使用inspect元素时,我看到数据是表格格式,如下图所示。

enter image description here

源代码包含此格式的数据。

enter image description here

  

DataStore.prime('stagefixtures',$。exptend({stageId:12496,isAggregate:false},calendar.parameter()),[[959688,1,'Monday,Dec 21 2015','20:00 ',13,'阿森纳',0,167,'曼城',0,'2:1','2:0',1,1,'FT','1',0,1,112,0]   ,[959683,4,'星期六,2015年12月26日','12:45',96,'斯托克',0,32,'曼联',0,'vs',, 0,1 ,,' - 1 ”,0,1,13,0]   ,[959615,4,'星期六,2015年12月26日','15:00',24,'阿斯顿维拉',0,29,'西汉姆',0,'vs',, 0,1 ,,' - 1' ,0,1,6,0]   ,[959625,4,'星期六,2015年12月26日','15:00',183,'伯恩茅斯',0,162,'水晶宫',0,'vs',, 0,1 ,,' - 1', 0,1,10,0]   ,[959635,4,'星期六,2015年12月26日','15:00',15,'切尔西',0,27,'沃特福德',0,'vs',, 0,1 ,,' - 1' ,0,1,15,0]   ,[959645,4,'星期六,2015年12月26日','15:00',26,'利物浦',0,14,'莱斯特',0,'vs',, 0,1 ,,' - 1' ,0,1,15,0]   ,[959655,4,'星期六,2015年12月26日','15:00',167,'曼城',0,16,'桑德兰',0,'vs',, 0,1 ,,' - 1 ”,0,1,4,0​​]   ,[959691,4,'星期六,2015年12月26日','15:00',259,'斯旺西',0,175,'West Bromwich Albion',0,'vs',, 0,1 ,,' - 1' ,0,1,5,0]   ,[959698,4,'星期六,2015年12月26日','15:00',30,'托特纳姆',0,168,'诺维奇',0,'vs',, 0,1 ,,' - 1',0 ,1,8,0]   ,[959665,4,'星期六,2015年12月26日','17:30',23,'纽卡斯尔联队',0,31,'埃弗顿',0,'vs',, 0,1 ,,' - 1 ”,0,1,7,0]   ,[959674,4,'星期六,2015年12月26日','19:45',18,'南安普顿',0,13,'阿森纳',0,'vs',, 0,1 ,,' - 1' ,0,1,11,0]   ]);

这段代码应该从表格格式中抓取数据,但在这种情况下我不知道怎么做。

Option Explicit

Sub WeeklyFixtures()


 Dim IE As Object, obj As Object

 Dim r As Integer, c As Integer, t As Integer
 Dim elemCollection As Object

 Set IE = CreateObject("InternetExplorer.Application")

 With IE
 .Visible = True
 .navigate ("http://www.whoscored.com/regions/252/tournaments/2/england-premier-league")
While IE.ReadyState <> 4
DoEvents
Wend

 Do While IE.busy: DoEvents: Loop

ThisWorkbook.Sheet1.Clear

 Set elemCollection = IE.Document.getElementsByTagName("TABLE")

    For t = 0 To (elemCollection.Length - 1)

        For r = 0 To (elemCollection(t).Rows.Length - 1)
            For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
            Next c
        Next r
    Next t

 End With

 Set IE = Nothing

 End Sub

2 个答案:

答案 0 :(得分:1)

你有Excel 2016吗?在那里你应该能够得到它:Data - &gt; New Query - &gt; From Other Sources - &gt; From Web。在那里,您可以输入您的URL。如果你记录了所有这些,你甚至可以获得相应的VBA代码。

enter image description here

答案 1 :(得分:1)

试试这段代码:

Option Explicit

Sub GetWhoscoredData()

    Dim strCont, arrRows, strComma, arrQuots, i, arrCols

    With CreateObject("Microsoft.XMLHTTP")
        .Open "GET", "http://www.whoscored.com/regions/252/tournaments/2/england-premier-league", False
        .Send
        strCont = .ResponseText
    End With

    strCont = Split(strCont, "'stagefixtures'")(1)
    strCont = Split(strCont, "[[")(1)
    strCont = Split(strCont, "]);")(0)
    strCont = Replace(strCont, vbCrLf, "")
    strComma = Mid(CreateObject("Scriptlet.TypeLib").GUID, 2, 36)
    arrQuots = Split(strCont, "'")
    For i = 1 To UBound(arrQuots) Step 2
        arrQuots(i) = Replace(arrQuots(i), ",", strComma)
    Next
    strCont = Join(arrQuots, "")
    arrRows = Split(strCont, "],[")
    For i = 0 To UBound(arrRows)
        arrCols = Split(arrRows(i), ",")
        Cells(i + 1, 1).Value = Replace(arrCols(2), strComma, ",")
        Cells(i + 1, 2).Value = Replace(arrCols(3), strComma, ",")
        Cells(i + 1, 3).Value = Replace(arrCols(14), strComma, ",")
        Cells(i + 1, 4).Value = Replace(arrCols(5), strComma, ",")
        Cells(i + 1, 5).NumberFormat = "@"
        Cells(i + 1, 5).Value = Replace(arrCols(10), strComma, ",")
        Cells(i + 1, 6).Value = Replace(arrCols(8), strComma, ",")
    Next
    Cells.Columns.AutoFit

End Sub

它给我的输出如下:

output