从网站提取数据到Excel错误

时间:2018-07-24 13:02:15

标签: excel vba excel-vba web-scraping extraction

我很难设置从网站到Excel的数据提取。 我想提取出产品的确切价格以取得卓越的效果。 到目前为止,我有以下代码:

Sub GetData()

    Dim objIE As InternetExplorer  'Microsoft Internet Controls library added
    Dim itemEle As Object
    Dim data As String
    Dim y As Integer

    Set objIE = New InternetExplorer
    objIE.Visible = True

    objIE.navigate "https://www.nay.sk/samsung-ue55nu7172"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

    y = 1

    For Each itemEle In objIE.document.getElementsByClassName("price")
    data = itemEle.getElementsByClassName("price")(0).innerText
        y = y + 1
    Next
    data = Range("A1").Value
End Sub

您有什么建议?

2 个答案:

答案 0 :(得分:2)

您想要每个价格吗?

您可以这样列出前两个示例:

Option Explicit
Public Sub GetInfo()
    Dim sResponse As String, i As Long, html As New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.nay.sk/samsung-ue55nu7172", False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
    Dim titles As Object, prices As Object
    With html
        .body.innerHTML = sResponse
        Set titles = .querySelectorAll(".title")
        Set prices = .querySelectorAll(".price")
    End With
    For i = 0 To 1
        Debug.Print titles(i).innerText & prices(i).innerText
    Next i
End Sub

该循环返回这些:

data


实际上,页面上的所有元素都带有price类,存储在对象prices中。

您可以通过以下方式遍历该对象/ nodeList的长度来查看所有价格:

For i = 0 To prices.Length - 1
    Debug.Print Prices.item(i).innerText
Next i

同样,您可以循环.Length的{​​{1}},但是请注意,它的长度与titles不同。该页面上的价格更高(或者是prices类的元素,而不是price类的元素。


参考(VBE>工具>参考):

  1. HTML对象库

答案 1 :(得分:1)

尝试一下:

Sub GetData()

    Dim objIE As New InternetExplorer   'Microsoft Internet Controls library added
    Dim itemEle As Object
    Dim data As String
    Dim y As Integer

    objIE.Visible = True

    objIE.navigate "https://www.nay.sk/samsung-ue55nu7172"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

    y = 1

    For Each itemEle In objIE.document.getElementsByClassName("price")
        Cells(y, 1) = itemEle.outertext
        y = y + 1
    Next

End Sub

这是您得到的:

enter image description here

要获取itemEle的正确属性:

  • 在以下打印屏幕的行上放置一个停止标志
  • 用鼠标选择itemEle
  • Shift + F9

enter image description here