子级嵌套提取没有标签或属性的数据

时间:2019-01-09 14:33:06

标签: excel vba web-scraping

我无法使用vba分别提取网站中的抓取数据

link to site soccer rating

columns = ['group','sensor1', 'sensor2', 'sensor3']
vals = [
    (a, 0.8, 0, 0.4118),
    (a, 0.5, 0.1026, 1),
    (a, 1, 0.615, 0.11),
    (a, 0, 1, 0)
    (b, 0.333, 0, 0)
    (b, 1, 0.333, 1)
    (b, 0, 1, 0.333)
]

插入我的代码,我同时提取日期和冠军代码

<tr bgcolor="#ffffff">
   <td class="nomobil">30</td>
   <td>
     12.01.19
     <div class="ismobil">UK3</div>
   </td>
   .
   .
   .
   .
   .
</tr>

在第一个单元格中,我只找到冠军代码而不是日期

如果我让婴儿离开(0),我会得到这个:

Dim objIE As Object
Dim itemEle As Object
dim td as Object
dim i as integer

Set objIE = CreateObject("internetexplorer.application")
objIE.Visible = True
objIE.navigate "http://www.soccer-rating.com/Manchester-City/220/"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Set itemEle = objIE.document.getElementsByclassname("bigtable")(4)
i = 0
For Each td In itemEle.getElementsBytagname("tr")
  If i > 0 Then

    cells(i , 1) = td.getElementsBytagname("td")(1).Children(0).innertext
  end if
  i = i + 1
next td

我发现自己在牢房(i,1)

enter image description here

2 个答案:

答案 0 :(得分:0)

您可以通过以下方式获取日期:

Option Explicit
Public Sub GetDate()
    Dim ie As InternetExplorer
    Set ie = New InternetExplorer
    With ie
        .Visible = True
        .Navigate2 "http://www.soccer-rating.com/Manchester-City/220/"
        While .Busy Or .readyState < 4: DoEvents: Wend
        Debug.Print Split(.Document.querySelector(".bigtable:nth-of-type(3) .bigtable tr:nth-of-type(1) td").innerText, Chr$(32))(0)
        .Quit
    End With
End Sub

带有xmlhttp请求,没有浏览器

Option Explicit    
Public Sub GetInfo()
    Dim html As HTMLDocument
    Set html = New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "http://www.soccer-rating.com/Manchester-City/220/", False
        .send
        html.body.innerHTML = .responseText
    End With
    Debug.Print Split(html.getElementsByClassName("bigtable")(3).getElementsByTagName("TD")(0).innerText, Chr$(32))(0)
End Sub

需要引用Microsoft Internet ControlsMicrosoft HTML Object Library

答案 1 :(得分:0)

反正我解决了

  

cells(i,1)= split(td.getElementsBytagname(“ td”)(1).Children(0).innertext,Chr $(13))(1)

其中

  

Chr $(13)

以ASCII码“输入”