VBA Web自动化:从表/或标记名(td)中抓取非文本

时间:2017-05-25 15:08:57

标签: vba web automation scrape

我一直试图从网站soccerstats网上抓取数据,特别是足球队“阿森纳的结果(http://www.soccerstats.com/team.asp?league=england&teamid=15

(网页上有几个表格,我在最大表格中的数据之后)

我当前的代码从任何td标签中删除了一个混乱的内容:

'start a new subroutine called SearchBot
Sub soccer_stats()
 
    'dimension (declare or set aside memory for) our variables
    Dim objIE As InternetExplorer 'special object variable representing the IE browser
    Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
    Dim y As Integer 'integer variable we'll use as a counter
    Dim result As String 'string variable that will hold our result link
    Dim Variable1 As String
 
 Variable1 = InputBox("put in what you are searching")
 
    'initiating a new instance of Internet Explorer and asigning it to objIE
    Set objIE = New InternetExplorer
 
    'make IE browser visible (False would allow IE to run in the background)
    objIE.Visible = True
 
    'navigate IE to this web page (a pretty neat search engine really)
    objIE.navigate "http://www.soccerstats.com/"
 
    'wait here a few seconds while the browser is busy
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
Dim ele As Object

For Each ele In objIE.document.getElementsByTagName("input")
    If ele.Name = "searchstring" Then
        ele.Value = Variable1
    End If
Next ele

For Each ele In objIE.document.getElementsByTagName("input")
    If ele.className = "submit" Then
        ele.Click
    End If
Next ele


    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

For Each ele In objIE.document.getElementsByTagName("a")
    If ele.innerText = Variable1 Then
        ele.Click
    End If
Next ele

    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    
    
    
    
   

    
    'new bit
    y = 2
  For Each ele In objIE.document.getElementsByTagName("td")
 
        '...get the innertext and print it to the sheet in col A, row y
        result = ele
        Sheets("Sheet2").Range("A" & y).Value = ele.innerText
 
     
  y = y + 1
  Next
  
  
  Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
End Sub

如果符合条件i,ii,iii,iv,有没有办法将内部文本粘贴到A,B,C,D行?

表格的第一列有html: <td height=”18” align=”right”> 14 Aug</td

我可以将我的代码更改为For Each ele在objIE.document.getElementsByTagName(“td”)AND height =“18?

并且对于表格中的下一列,html代码没有高度,所以我可以将其更改为 “For each ele in objIE.document.getElementsByTagName(”td“)AND height = null?

还是有更好的方法刮掉整个桌子?谢谢你的帮助

编辑:

网页中每列的html为: 日期栏:

<td height=”18” align=”right”> 14 Aug</td

主队列:

<td align=”right”><b>Arsenal</b></td>

得分栏:

   <td width=”45 align=”center”>
<a class=”tooltip2” href=”#”>
<font color=”#0000aa”>
<b>3 – 4</b>

离开球队专栏:

    <td align="left">
Liverpool
</td>

0 个答案:

没有答案