我试图用硒从网页上抓取表格数据。但是,它会解析该页面中的所有表格,但我需要一个表格。我不知道如何选择单个表格。这是我尝试过的:
Sub table_data()
Dim driver As New WebDriver
Dim tabl As Object, rdata As Object, cdata As Object
Set driver = New WebDriver
driver.Start "Phantomjs", "https://fantasy.premierleague.com"
driver.get "/player-list/"
For Each tabl In driver.FindElementsByXPath("//table[@class='ism-table']")
For Each rdata In tabl.FindElementsByXPath(".//tr")
For Each cdata In rdata.FindElementsByXPath(".//td")
y = y + 1
Cells(x, y) = cdata.Text
Next cdata
x = x + 1
y = 0
Next rdata
Next tabl
End Sub
我是用XHR做的!
Sub TableData()
Dim xmlpage As New XMLHTTP60
Dim htmldoc As New MSHTML.HTMLDocument
Dim htmlas As Object, tRow As Object, tCel As Object
x = 1
With xmlpage
.Open "GET", "https://fantasy.premierleague.com/player-list/", False
.send
htmldoc.body.innerHTML = .responseText
End With
Set htmlas = htmldoc.getElementsByTagName("table")(2)
For Each tRow In htmlas.Rows
For Each tCel In tRow.Cells
c = c + 1
Cells(x, c) = tCel.innerText
Next tCel
c = 0
x = x + 1
Next tRow
End Sub
答案 0 :(得分:3)
您可以在第一个表格行完成后暂停Sub table_data()
Dim driver As New WebDriver
Dim tabl As Object, rdata As Object, cdata As Object
Set driver = New WebDriver
driver.Start "Phantomjs", "https://fantasy.premierleague.com"
driver.get "/player-list/"
For Each tabl In driver.FindElementsByXPath("//table[@class='ism-table']")
For Each rdata In tabl.FindElementsByXPath(".//tr")
For Each cdata In rdata.FindElementsByXPath(".//td")
y = y + 1
Cells(x, y) = cdata.Text
Next cdata
x = x + 1
y = 0
Next rdata
Goto end_of_for
Next tabl
end_of_for:
End Sub
循环
FindElementsByXPath
或只是只获取driver.FindElementsByXpath(....)(0)
的第一个元素,因为Items
应该返回第一个元素。
<强> - (编辑)强>
根据this docs,您应该能够driver.FindElementsByXpath(....).Item(4)
获得正确的值,以便它npm install
。
答案 1 :(得分:1)
实际上你可以通过XHR和Split来做到这一点,不需要使用Selenium。看看下面的代码:
Option Explicit
Sub Scrape_premierleague_com()
Dim sResponse, j, i, aRows, aCells
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://fantasy.premierleague.com/player-list/", False
.Send
sResponse = .responseText
End With
ThisWorkbook.Sheets(1).Cells.Delete
sResponse = Split(Split(sResponse, "<tbody>")(1), "</tbody>", 2)(0) ' 1 - number of the table
aRows = Split(sResponse, "<tr>")
For j = 1 To UBound(aRows)
aCells = Split(aRows(j), "<td>")
For i = 1 To UBound(aCells)
ThisWorkbook.Sheets(1).Cells(j, i).Value = Split(aCells(i), "</td>", 2)(0)
Next
Next
ThisWorkbook.Sheets(1).Columns.AutoFit
End Sub
这是我的输出: