我想使用VBA InStr函数抓取该URL https://www.realtor.com/realestateandhomes-search/06510,并提取所有带有此子字符串“ 06510”的URL
这是我一直在努力的示例代码。
Option Explicit
Sub GetLinks()
'
'To use HTMLDocument you need to set a reference to Tools -> References -> Microsoft HTML Object Library
Dim HTML As New HTMLDocument
Dim http As Object
Dim links As Object
Dim link As HTMLHtmlElement
Dim counter As Long
Dim website As Range
Dim LastRange As Range
Dim row As Long
Dim continue As Boolean
Dim respHead As String
Dim lRow As Long
Application.ScreenUpdating = False
' The row where website addresses start
row = 30
continue = True
lRow = Cells(Rows.count, 1).End(xlUp).row + 1
' XMLHTTP gives errors where ServerXMLHTTP does not
' even when using the same URL's
'Set http = CreateObject("MSXML2.XMLHTTP")
Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Do While continue
' Could set this to first cell with URL then OFFSET columns to get next web site
Set website = Range("A" & row)
Set LastRange = Range("B" & lRow)
If Len(website.Value) < 1 Then
continue = False
Exit Sub
End If
If website Is Nothing Then
continue = False
End If
'Debug.Print website
With http
On Error Resume Next
.Open "GET", website.Value, False
.send
' If Err.Num is not 0 then an error occurred accessing the website
' This checks for badly formatted URL's. The website can still return an error
' which should be checked in .Status
'Debug.Print Err.Number
' Clear the row of any previous results
Range("B" & row & ":e" & row).Clear
' If the website sent a valid response to our request
If Err.Number = 0 Then
If .Status = 200 Then
HTML.body.innerHTML = http.responseText
Set links = HTML.getElementsByTagName("a")
For Each link In links
If InStr(link.outerHTML, "06510") Then
LastRange.Value = link.href
End If
Next
End If
Set website = Nothing
Else
'Debug.Print "Error loading page"
LastRange.Value = "Error with website address"
End If
On Error GoTo 0
End With
row = row + 1
Loop
Application.ScreenUpdating = True
End Sub
检查页面后,以下是要提取的URL类型的示例-https://www.realtor.com/realestateandhomes-detail/239-Bradley-St_New-Haven_CT_06510_M36855-92189。任何帮助将不胜感激
以简化的方式使用QHarr的代码...
Sub GetLinks()
Dim url As String, links_count As Integer
Dim j As Integer, row As Integer
Dim XMLHTTP As Object, html As Object
'Dim tr_coll As Object, tr As Object
'Dim elements As Object
Dim i As Long, allLinksOfInterest As Object
'Dim td_coll As Object, td As Object, td_col, objT
url = "https://www.realtor.com/realestateandhomes-search/06510"
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.responseText
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
For i = 0 To allLinksOfInterest.Length - 1
Debug.Print allLinksOfInterest.Item(i).href
Next
End Sub
请检查我是否有任何遗漏。我仍然收到错误“对象不支持此属性或方法”
答案 0 :(得分:1)
If InStr(link.outerHTML, "06510") Then
在上面的代码中,InStr函数的使用类似于布尔函数。但是它不是布尔值,而是返回整数。因此,您应该在函数后添加比较运算符。可能像:
If InStr(link.outerHTML, "06510")>0 Then
答案 1 :(得分:1)
在所有Instr
标签的循环期间,请勿在整个节点externalHTML上使用a
。有时需要这样做,但不应该是其中之一(希望如此)。
您想将attribute = value css选择器与contains
,*
运算符一起使用。专门用于匹配属性值中的子字符串的目的。这样更有效。
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
所以
Dim i As Long, allLinksOfInterest As Object
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
For i = 0 To allLinksOfInterest.Length - 1
Debug.Print allLinksOfInterest.Item(i).href
Next
属性=包含包含运算符的值:
[attr * = value]
表示属性名称为attr的元素,其属性 值在字符串中至少包含一个值。
VBA:
当前产生26个链接。所有都是相对链接,因此需要添加域,如循环所示。有些是重复项,因此请考虑将其作为键添加到字典中,以便删除重复项。
Option Explicit
Public Sub GetLinks()
Dim html As HTMLDocument
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.realtor.com/realestateandhomes-search/06510", False
.send
html.body.innerHTML = .responseText
End With
Dim i As Long, allLinksOfInterest As Object
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
For i = 0 To allLinksOfInterest.Length - 1
Debug.Print Replace$(allLinksOfInterest.item(i).href,"about:","https://www.realtor.com")
Next
End Sub