无法使用selenium选择单个表

时间:2017-05-21 18:42:16

标签: vba selenium web-scraping

我试图用硒从网页上抓取表格数据。但是,它会解析该页面中的所有表格,但我需要一个表格。我不知道如何选择单个表格。这是我尝试过的:

Sub table_data()
    Dim driver As New WebDriver
    Dim tabl As Object, rdata As Object, cdata As Object

    Set driver = New WebDriver
    driver.Start "Phantomjs", "https://fantasy.premierleague.com"
    driver.get "/player-list/"
    For Each tabl In driver.FindElementsByXPath("//table[@class='ism-table']")
        For Each rdata In tabl.FindElementsByXPath(".//tr")
            For Each cdata In rdata.FindElementsByXPath(".//td")
            y = y + 1
            Cells(x, y) = cdata.Text
            Next cdata
            x = x + 1
            y = 0
        Next rdata
    Next tabl
End Sub

我是用XHR做的!

Sub TableData()
Dim xmlpage As New XMLHTTP60
Dim htmldoc As New MSHTML.HTMLDocument
Dim htmlas As Object, tRow As Object, tCel As Object

x = 1
With xmlpage
    .Open "GET", "https://fantasy.premierleague.com/player-list/", False
    .send
    htmldoc.body.innerHTML = .responseText
End With
Set htmlas = htmldoc.getElementsByTagName("table")(2)
For Each tRow In htmlas.Rows
    For Each tCel In tRow.Cells
        c = c + 1
        Cells(x, c) = tCel.innerText
    Next tCel
    c = 0
    x = x + 1
Next tRow
End Sub

2 个答案:

答案 0 :(得分:3)

您可以在第一个表格行完成后暂停Sub table_data() Dim driver As New WebDriver Dim tabl As Object, rdata As Object, cdata As Object Set driver = New WebDriver driver.Start "Phantomjs", "https://fantasy.premierleague.com" driver.get "/player-list/" For Each tabl In driver.FindElementsByXPath("//table[@class='ism-table']") For Each rdata In tabl.FindElementsByXPath(".//tr") For Each cdata In rdata.FindElementsByXPath(".//td") y = y + 1 Cells(x, y) = cdata.Text Next cdata x = x + 1 y = 0 Next rdata Goto end_of_for Next tabl end_of_for: End Sub 循环

FindElementsByXPath

或只是只获取driver.FindElementsByXpath(....)(0)的第一个元素,因为Items应该返回第一个元素。

<强> - (编辑)

根据this docs,您应该能够driver.FindElementsByXpath(....).Item(4)获得正确的值,以便它npm install

答案 1 :(得分:1)

实际上你可以通过XHR和Split来做到这一点,不需要使用Selenium。看看下面的代码:

Option Explicit

Sub Scrape_premierleague_com()

    Dim sResponse, j, i, aRows, aCells

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://fantasy.premierleague.com/player-list/", False
        .Send
        sResponse = .responseText
    End With
    ThisWorkbook.Sheets(1).Cells.Delete
    sResponse = Split(Split(sResponse, "<tbody>")(1), "</tbody>", 2)(0) ' 1 - number of the table
    aRows = Split(sResponse, "<tr>")
    For j = 1 To UBound(aRows)
        aCells = Split(aRows(j), "<td>")
        For i = 1 To UBound(aCells)
            ThisWorkbook.Sheets(1).Cells(j, i).Value = Split(aCells(i), "</td>", 2)(0)
        Next
    Next
    ThisWorkbook.Sheets(1).Columns.AutoFit

End Sub

这是我的输出:

output