我正在尝试使用下面的VBA代码从this HTML page中提取价格:
这是HTML代码段:
<div class="box-text box-text-products">
<div class="title-wrapper">
<p class="category uppercase is-smaller no-text-overflow product-cat op-7">
Xikar Lighters
</p>
<p class="name product-title">
<a href="https://www.havanahouse.co.uk/product/xikar-allume-single-jet-flame-racing-cigar-lighter-bluewhite-stripe/">Xikar Allume Single Jet Flame Racing Cigar Lighter – Blue/White Stripe</a>
</p>
</div>
<div class="price-wrapper">
<span class="price">
<del>
<span class="woocommerce-Price-amount amount">
<span class="woocommerce-Price-currencySymbol">£</span>48.00
</span>
</del>
<ins>
<span class="woocommerce-Price-amount amount">
<span class="woocommerce-Price-currencySymbol">£</span>45.00
</span>
</ins>
</span>
</div>
</div>
<!-- box-text -->undefined</div>undefined<!-- box -->undefined</div>undefined<!-- .col-inner -->undefined</div>undefined<!-- col -->
我使用下面的代码,但收到错误:
For Each oElement In oHtml.getElementsByClassName("woocommerce-Price-amoun t amount")
If oElement.getElementsByTagName("del") Then Exit For
If oElement.innerText <> 0 Then
Cells(counter, 3) = CDbl(oElement.innerText)
counter = counter + 1
End If
Next oElement
答案 0 :(得分:0)
看一下下面的例子:
Option Explicit
Sub Test()
Dim sUrl As String
Dim oWS As Worksheet
Dim i As Long
Dim sResp As String
Dim sCont As String
Dim oMatch
sUrl = "https://www.havanahouse.co.uk/?post_type=product"
Set oWS = ThisWorkbook.Sheets(1)
oWS.Cells.Delete
i = 1
Do
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sUrl, False
.send
sResp = .ResponseText
End With
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "<div class=""shop-container"">([\s\S]*?)<div class=""container"">"
With .Execute(sResp)
If .Count = 0 Then Exit Do
sCont = .Item(0).SubMatches(0)
End With
.Pattern = "<div class=""title-wrapper"">([\s\S]*?)</div><div class=""price-wrapper"">([\s\S]*?)</div>"
For Each oMatch In .Execute(sCont)
oWS.Cells(i, 1) = GetInnerText(oMatch.SubMatches(0))
oWS.Cells(i, 2) = GetInnerText(oMatch.SubMatches(1))
oWS.Columns.AutoFit
i = i + 1
DoEvents
Next
oWS.Cells(i, 1).Select
.Pattern = "<a class=""next page-number""[\s\S]*?href=""([^""]*)"""
With .Execute(sResp)
If .Count = 0 Then Exit Do
sUrl = .Item(0).SubMatches(0)
End With
End With
Loop
End Sub
Function GetInnerText(sText As String) As String
Static oHtmlfile As Object
Static oDiv As Object
If oHtmlfile Is Nothing Then
Set oHtmlfile = CreateObject("htmlfile")
oHtmlfile.Open
Set oDiv = oHtmlfile.createElement("div")
End If
oDiv.innerHTML = sText
GetInnerText = oDiv.innerText
End Function
我的输出如下:
一般情况下,建议不要使用RegEx进行HTML解析,因此there is disclaimer。在这种情况下处理的数据非常简单,这就是使用RegEx解析的原因。关于RegEx:introduction(特别是syntax),introduction JS,VB flavor。