Greyhound数据导入到excel宏公式

时间:2018-02-28 11:10:20

标签: excel vba excel-vba

作为研究项目的一部分,我需要从网页中提取尽可能多的数据。问题是访问每个表我必须遵循许多我无法自动工作的链接。

来自greyhound-data.com。举个例子,我想提取2017年1月1日至2018年2月28日期间参加比赛的每只狗的所有比赛统计数据。当我把它放入搜索引擎时,我会在一张桌子上获得57236场比赛。我必须按照每场比赛的比赛名称链接..

http://www.greyhound-data.com/d?racename=&country=13000&startmonth=3&endmonth=2&startdate=2017&enddate=2018&maxdist=unlimitied&class=any&order=dateD&x=2

我最大的问题是我不知道如何按照各种链接说。而且我不知道如何循环多次 - 原始列表中的每个比赛一次。

我创建了简单的宏查询:

Sub GetData()

Dim IE As Object
Dim doc As Object
Dim strURL As String
Dim I As Integer

For I = 1 To 9
strURL = "http://www.greyhound-data.com/d?racename=&country=13000&startmonth=3&endmonth=2&startdate=2017&enddate=2018&maxdist=unlimitied&class=any&order=dateD&x=" + Trim(Str(I))

Set IE = CreateObject("InternetExplorer.Application")
With IE

.navigate strURL
Do Until .ReadyState = 4: DoEvents: Loop
Do While .Busy: DoEvents: Loop
Set doc = IE.Document
GetAllTables doc

.Quit

End With
Next I

End Sub

Sub GetAllTables(doc As Object)

Dim ws As Worksheet
Dim rng As Range
Dim tbl As Object
Dim rw As Object
Dim cl As Object
Dim tabno As Long
Dim nextrow As Long
Dim I As Long

Set ws = Worksheets.Add

For Each tbl In doc.getElementsByTagName("TABLE")
tabno = tabno + 1
nextrow = nextrow + 1
Set rng = ws.Range("B" & nextrow)
rng.Offset(, -1) = "Table " & tabno
For Each rw In tbl.Rows
For Each cl In rw.Cells
rng.Value = cl.outerText
Set rng = rng.Offset(, 1)
I = I + 1
Next cl
nextrow = nextrow + 1
Set rng = rng.Offset(1, -I)
I = 0
Next rw
Next tbl

End Sub

它会自动从网址中检索所有种族数据。但无法迈出下一步。在每个页面上都有一个“竞赛名称”选项卡,我需要获取每一行的每一页的所有数据。因为我需要获得第一名,第二名和第三名的信息。

感谢您的时间,我知道它有点乱码!!

更改后的新代码如下所示:

Sub GetData()

Dim IE As Object
Dim doc As Object
Dim strURL As String
Dim I As Integer

For I = 1 To 9
strURL = "http://www.greyhound-data.com/d?racename=&country=13000&startmonth=3&endmonth=2&startdate=2017&enddate=2018&maxdist=unlimitied&class=any&order=dateD&x=" + Trim(Str(I))

Set IE = CreateObject("InternetExplorer.Application")
With IE

.navigate strURL
Do Until .ReadyState = 4: DoEvents: Loop
Do While .Busy: DoEvents: Loop
Set doc = IE.Document
GetAllTables doc

.Quit

End With
Next I

End Sub


Sub GetAllTables(doc As Object)

Dim ws As Worksheet
Dim rng As Range
Dim tbl As Object
Dim rw As Object
Dim cl As Object
Dim tabno As Long
Dim nextrow As Long
Dim I As Long
Dim ThisLink As Object 'variable for <a> tags
Set ws = Worksheets.Add

For Each tbl In doc.getElementsByTagName("TABLE")
    tabno = tabno + 1
    nextrow = nextrow + 1
    Set rng = ws.Range("B" & nextrow)

    rng.Offset(, -1) = "Table " & tabno
    For Each rw In tbl.Rows
        For Each cl In rw.Cells
            rng.Value = cl.outerText
            Set rng = rng.Offset(, 1)
            I = I + 1
        Next cl
    nextrow = nextrow + 1
    Set rng = rng.Offset(1, -I)
    I = 0
    Next rw
Next tbl

I = Range("B" & Rows.Count).End(xlUp).Row 'last row with data

Do While Cells(I, 1).Value = "" 'will loop until first not blank found in column A (starting from last row of data, from end to start)
    For Each ThisLink In doc.getElementsByTagName("a") 'we check all <a> tags
        If ThisLink.innerText = Cells(I, 2).Value Then Cells(I, 1).Value = ThisLink.href 'If the innertext is the name of the race, in column A we add link
    Next ThisLink
    I = I - 1 'we decrease row position
Loop
End Sub

但情况是它返回空表,如此链接:https://imageshack.us/i/poC4yhEZp

1 个答案:

答案 0 :(得分:1)

此代码在您获取所有数据后,将检查从列表末尾到列表开头的每个种族。它将在A列中添加相关的竞赛链接。

Sub GetAllTables(doc As Object)

Dim ws As Worksheet
Dim rng As Range
Dim tbl As Object
Dim rw As Object
Dim cl As Object
Dim tabno As Long
Dim nextrow As Long
Dim I As Long
Dim ThisLink As Object 'variable for <a> tags
Set ws = Worksheets.Add

For Each tbl In doc.getElementsByTagName("TABLE")
    tabno = tabno + 1
    nextrow = nextrow + 1
    Set rng = ws.Range("B" & nextrow)

    rng.Offset(, -1) = "Table " & tabno
    For Each rw In tbl.Rows
        For Each cl In rw.Cells
            rng.Value = cl.outerText
            Set rng = rng.Offset(, 1)
            I = I + 1
        Next cl
    nextrow = nextrow + 1
    Set rng = rng.Offset(1, -I)
    I = 0
    Next rw
Next tbl

I = Range("B" & Rows.Count).End(xlUp).Row 'last row with data

Do While Cells(I, 1).Value = "" 'will loop until first not blank found in column A (starting from last row of data, from end to start)
    For Each ThisLink In doc.getElementsByTagName("a") 'we check all <a> tags
        If ThisLink.innerText = Cells(I, 2).Value Then Cells(I, 1).Value = ThisLink.href 'If the innertext is the name of the race, in column A we add link
    Next ThisLink
    I = I - 1 'we decrease row position
Loop
End Sub

在HTML中,所有<a>标记都是这样的:

<a href="d?r=4269456&z=F0K9jn">Henlow 26 Feb 2018 HT 5</a>

href 属性包含与<a></a>之间的文字相关的链接。您可以在VBA中使用a.href获取它

要了解<a></a>之间的文字,您可以使用a.InnerText

我所做的只是一个检查每个<a>标记的简单循环。如果InnerText匹配单元格中的值,那么我将获得href属性。

此代码将为您提供所需的所有链接。只需根据您的需要调整代码(我将它们粘贴在A列中,但也许您想对它们执行其他操作)。

此代码需要2个引用才能生效;

  1. Microsoft HTML对象库
  2. Microsoft Internet Controls
  3. enter image description here

    这是最终结果:

    enter image description here