使用Excel VBA,我必须从此website中删除一些数据。
由于相关网站对象不包含id
,因此我无法使用HTML.Document.GetElementById
。
但是,我注意到相关信息始终存储在<div>
部分中,如下所示:
<div style="padding:7px 12px">Basler Versicherung AG Özmen</div>
问题:
是否有可能构建一个RegExp
,可能在一个循环中,返回<div style="padding:7px 12px">
和下一个</div>
内的内容?
到目前为止我所拥有的是容器的完整InnerHtml
,显然我需要添加一些代码来遍历尚未构建的RegExp。
Private Function GetInnerHTML(url As String) As String
Dim i As Long
Dim Doc As Object
Dim objElement As Object
Dim objCollection As Object
On Error GoTo catch
'Internet Explorer Object is already assigned
With ie
.Navigate url
While .Busy
DoEvents
Wend
GetInnerHTML = .document.getelementbyId("cphContent_sectionCoreProperties").innerHTML
End With
Exit Function
catch:
GetInnerHTML = Err.Number & " " & Err.Description
End Function
答案 0 :(得分:2)
我认为您不需要正则表达式来查找页面上的内容。您可以使用元素的相对位置来查找您所追求的我认为的内容。
<强>代码强>
Option Explicit
Public Sub GetContent()
Dim URL As String: URL = "https://www.uid.admin.ch/Detail.aspx?uid_id=CHE-105.805.649"
Dim IE As Object: Set IE = CreateObject("InternetExplorer.Application")
Dim Labels As Object
Dim Label As Variant
Dim Values As Variant: ReDim Values(0 To 1, 0 To 5000)
Dim i As Long
With IE
.Navigate URL
.Visible = False
'Load the page
Do Until IE.busy = False And IE.readystate = 4
DoEvents
Loop
'Find all labels in the table
Set Labels = IE.document.getElementByID("cphContent_pnlDetails").getElementsByTagName("label")
'Iterate the labels, then find the divs relative to these
For Each Label In Labels
Values(0, i) = Label.InnerText
Values(1, i) = Label.NextSibling.Children(0).InnerText
i = i + 1
Next
End With
'Dump the values to Excel
ReDim Preserve Values(0 To 1, 0 To i - 1)
ThisWorkbook.Sheets(1).Range("A1:B" & i) = WorksheetFunction.Transpose(Values)
'Close IE
IE.Quit
End Sub
答案 1 :(得分:2)
使用XMLHTTP
请求方法可以实现相同的另一种方法。试一试:
Sub Fetch_Data()
Dim S$, I&
With New XMLHTTP60
.Open "GET", "https://www.uid.admin.ch/Detail.aspx?uid_id=CHE-105.805.649", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
With .querySelectorAll("#cphContent_sectionCoreProperties label[id^='cphContent_ct']")
For I = 0 To .Length - 1
Cells(I + 1, 1) = .Item(I).innerText
Cells(I + 1, 2) = .Item(I).NextSibling.FirstChild.innerText
Next I
End With
End With
End Sub
在执行上述脚本之前参考添加到库中:
Microsoft HTML Object Library
Microsoft XML, V6.0