我如何使用excel vba将网站上的特定数据(名称,详细信息)提取到excel中?
下面我试图获得处理器和保修:
Option Explicit
Sub GetData()
Dim objIE As InternetExplorer
Dim itemELE As Object
Dim html As IHTMLDocument
Dim Processor As String
Dim warranty As String
Dim y As Integer
'start a new browser instance
Set objIE = New InternetExplorer
'make browser visible
objIE.Visible = True
'navigate to page with needed data
objIE.navigate "https://www.harveynorman.com.sg/computers-tablets-and-gaming/computers/laptops/"
'wait for page to load
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
y = 1
For Each itemELE In objIE.document.getElementsByClassName("expandabaleContent")
Processor = itemELE.getElementsByTagName("d1")(0).innerText
warranty = itemELE.getElementsByClassName("d1")(0).getElementsByTagName("a")(0).textContent
Sheets("Sheet1").Range("A" & y).Value = Processor
Sheets("Sheet1").Range("B" & y).Value = warranty
y = y + 1
Next
End Sub
页面截图:
答案 0 :(得分:4)
对于显示的页面(图像中的 ),您可以发出XMLHTTP (XHR) GET request来获取产品信息,而无需打开缓慢的IE浏览器实例。
处理器和保修信息:
如果您查看该页面,则有关处理器和保修的信息将与类名facetedResults-feature-list
您可以看到类名,然后看到一个dl
标记,其中包含一个dt
标记,该标记具有同级dd
标记。这些同级dd
标签中有两个与处理器和保修信息相关联。
在这种情况下,我使用CSS选择器来捕获所有这些dd
标签,这些标签可以简化,以忽略同级dt
和父dl
标签,仅使用:>
.facetedResults-feature-list dd
"."
是class selector。上面的CSS组合选择说,在类dd
的元素中获取facetedResults-feature-list
标签
产品标题信息:
titles
我使用了另一个CSS选择器:
.facetedResults-title
这是类facetedResults-title
的元素。其中包含产品标题。
在表格中写出产品标题,处理器和保修信息:
一些数学运算显示出,处理器信息每14重复一次,如果我在处理器的索引中加8,则会得到保修信息。您会看到如何写出每一个细节,这些细节发生在每隔14个重复的索引处。我将dd
元素的nodeList与titles
的循环组合在一起写出工作表。
VBA:
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, i As Long, html As New HTMLDocument
Application.ScreenUpdating = False
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.lenovo.com/sg/en/laptops/c/LAPTOPS", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
Dim titles As Object, targetedInfo As Object, rowCounter As Long
With html
.body.innerHTML = sResponse
Set titles = .querySelectorAll(".facetedResults-title")
Set targetedInfo = .querySelectorAll(".facetedResults-feature-list dd")
End With
With Worksheets("Sheet1")
For i = 0 To targetedInfo.Length - 1
If i Mod 14 = 0 Then
rowCounter = rowCounter + 1
.Cells(rowCounter, 1) = titles(rowCounter - 1).innerText
.Cells(rowCounter, 2) = targetedInfo(i).innerText
.Cells(rowCounter, 3) = targetedInfo(i + 8).innerText
End If
Next i
End With
Application.ScreenUpdating = True
End Sub
输出示例:
CSS选择器:
产品信息与'expandableContent facetedResults-expandableContent-features expandableContent-is-collapsed facetedResults-expandableContent-69'
类名
价格与'expandableContent facetedResults-expandableContent-price expandableContent-is-collapsed'
类名称相关。
您可以通过传统的.getElementsByClassName
选择它们,然后遍历集合,或者,就我而言,使用CSS选择器为类做同样的事情,然后遍历返回的{{ 1}}。
nodeList
与
相同.getElementsByClassName("expandableContent facetedResults-expandableContent-features expandableContent-is-collapsed facetedResults-expandableContent-69")
.querySelectorAll(".expandableContent.facetedResults-expandableContent-features.expandableContent-is-collapsed.facetedResults-expandableContent-69")
是class selector。
标题与类别"."
VBA:
facetedResults-title
所需参考(VBE>工具>参考):
答案 1 :(得分:2)
Qharr已经提供了一些不错的选择,但是如果您仍然想尝试IE,请参见下面的代码
Option Explicit
Sub GetData()
Dim objIE As InternetExplorer
Dim itemELE As Object
Dim html As IHTMLDocument
Dim Processor As String
Dim warranty As String
Dim y As Integer
'start a new browser instance
Set objIE = New InternetExplorer
'make browser visible
objIE.Visible = True
'navigate to page with needed data
objIE.navigate "https://www.lenovo.com/sg/en/laptops/c/LAPTOPS"
'wait for page to load
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Application.Wait Now + TimeSerial(0, 0, 3)
y = 1
For Each itemELE In objIE.document.getElementsByClassName("facetedResults-feature-list")
If InStr(1, itemELE.className, "bundleList", vbTextCompare) = 0 Then
Application.Wait Now + TimeSerial(0, 0, 2)
Processor = itemELE.getElementsByTagName("dl")(0).innerText
warranty = itemELE.getElementsByTagName("dl")(4).innerText
Sheets("Sheet1").Range("A" & y).Value = Processor
Sheets("Sheet1").Range("B" & y).Value = warranty
y = y + 1
End If
Next
End Sub
结果