使用VBA无法在<p>标记内获取文本

时间:2019-01-21 17:47:41

标签: excel vba web-scraping

我有以下网址

https://www.wavemotion.gr/shop/smartphone-accessories/easy-one-touch-wireless-fast-charging-mount

我正在尝试通过以下方法获取该产品的可用性

For i = 2 To lastrow

mylink = wks.Cells(i, 2).Value

ie.Navigate mylink

While ie.Busy Or ie.ReadyState < 4: DoEvents: Wend
t = Timer
Do
    DoEvents
    On Error Resume Next

    Set instock = ie.Document.querySelector(".stock.in-stock").innerText

    If instock Is Nothing Then
    Set availability = ie.Document.querySelector(".stock.out-of-stock").innerText
    Else
    Set availability = instock
    End If

    wks.Cells(i, "D") = availability


    If Timer - t > MAX_WAIT_SEC Then Exit Do
    On Error GoTo 0
Loop

Next i

但是我对

一无所获一无所获
Set instock = ie.Document.querySelector(".stock.in-stock").innerText

我检查了查询

https://try.jsoup.org/

正在工作

我在这里做错了什么?没有任何ID只能定位类名

<p class="stock in-stock">Διαθέσιμο</p>

2 个答案:

答案 0 :(得分:2)

因此,这里发生的是您正在尝试将Set字符串数据类型innerText设置为对象变量instock。返回Nothing的原因是因为您的On Error Resume Next语句禁止显示错误消息。如果将其取出并运行,您将得到一个Type Mismatch。您需要做的是将其拆分为将对象分配给对象变量的行,然后读取被分配对象的innerText的行。

Set instock = ie.Document.querySelector(".stock.in-stock")

If instock Is Nothing Then
    Set availability = ie.Document.querySelector(".stock.out-of-stock")
Else
    Set availability = instock
End If

wks.Cells(i, "D") = availability.innerText

答案 1 :(得分:0)

有一个更好,更快的方法。使用xmlhttp并从存储在脚本标记之一中的json中解析该信息。如果发出大量请求,则在节流/阻塞的情况下,您可能需要每x个请求添加一个等待。注意:尽管对另一个库(.bas)有依赖性,但是您可以对InternetExplorer使用相同的方法,从而删除许多代码行。

您需要从here安装jsonconverter.bas并转到vbe>工具>引用>并添加对Microsoft脚本运行时的引用

Option Explicit
Public Sub GetStocking()
    Dim json As Object, html As HTMLDocument
    Set html = New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.wavemotion.gr/shop/smartphone-accessories/easy-one-touch-wireless-fast-charging-mount", False
        .send
        html.body.innerHTML = StrConv(.responseBody, vbUnicode)
    End With

    Set json = JsonConverter.ParseJson(html.querySelector("script[type='application/ld+json']").innerHTML)

    Debug.Print json("offers")("availability")
End Sub

这是整个json包含的内容:


Internet Explorer版本:

Option Explicit
Public Sub GetInfo()
    Dim ie As New InternetExplorer, i As Long, s As String, scripts As Object, json As Object
    With ie
        .Visible = False
        .Navigate2 "https://www.wavemotion.gr/shop/smartphone-accessories/easy-one-touch-wireless-fast-charging-mount"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set scripts = .document.querySelectorAll("script[type='application/ld+json']")

        For i = 0 To scripts.Length - 1
            s = scripts.item(i).innerHTML
            If InStr(s, "availability") > 0 Then
                Set json = JsonConverter.ParseJson(s)
                Exit For
            End If
        Next
        .Quit
        If Not json Is Nothing Then Debug.Print json("offers")("availability")
    End With
End Sub