我正在尝试使用VBA自动化网络抓取工具,以收集某些商品的价格数据。我对VBA还是很陌生,一直试图从此处使用类似主题的答案来建立我的代码,但是由于“类型不匹配”而被卡住了。我用它来打开IE,效果很好:
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "https://grocery.walmart.com/"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
但是,我现在希望能找到价格,例如高露洁的价格为1.67美元, 在以下代码中,自然谷地的价格为2.78美元:
<span data-automation-id="items">
<div class="CartItem__itemContainer___3vA-E" tabindex="-1" data-automation-id="cartItem">
<div class="CartItem__itemInfo___3rgQd">
<span class="TileImage__tileImage___35CNo">
<div class="TileImage__imageContainer___tlQZb">
<img alt="1 of C, o" src="https://i5.walmartimages.com/asr/36829cef-43f2-4d21-9d5e-10aa9def01dd_7.04089903cc0038b3dac3c204ef7e417e.png?odnHeight=150&odnWidth=150&odnBg=ffffff" class="TileImage__image___3MrIo" data-automation-id="image" aria-hidden="true">
</div><span data-automation-id="quantity" class="TileImage__quantity___1rgG4 hidden__audiblyHidden___RoAkK" role="button" aria-label="1 of C, select to change quantities">
1</span></span><div class="CartItem__name___2RJs5">
<div data-automation-id="name" tabindex="0" role="button" aria-label="C button, Select to change quantities">
Colgate Cavity Protection Fluoride Toothpaste - 6 oz</div><span data-automation-id="list-price" class="ListPrice__listPrice___1x8TM" aria-label="1 dollar and 67 cents each">
$1.67 each</span><a class="CartItem__detailsLink___2ts9b" aria-label="Colgate Cavity Protection Fluoride Toothpaste - 6 oz" tabindex="0" href="/ip/Colgate-Cavity-Protection-Fluoride-Toothpaste---6-oz/49714957">
View details</a></div><span class="Price__groceryPriceContainer___19Jim CartItem__price___2ADX6" data-automation-id="price" aria-label="1 dollar and 67 cents ">
<sup class="Price__currencySymbol___3Ye7d">
$</sup><span class="Price__wholeUnits___lFhG5" data-automation-id="wholeUnits">
1</span><sup class="Price__partialUnits___1VX5w" data-automation-id="partialUnits">
67</sup></span></div><div></div></div><div class="CartItem__itemContainer___3vA-E" tabindex="-1" data-automation-id="cartItem">
<div class="CartItem__itemInfo___3rgQd">
<span class="TileImage__tileImage___35CNo">
<div class="TileImage__imageContainer___tlQZb">
<img alt="1 of N, a" src="https://i5.walmartimages.com/asr/775482d5-a136-4ca3-9353-28646ec999c3_1.d861ce7abd9797cbafec2cd2a4b24874.jpeg?odnHeight=150&odnWidth=150&odnBg=ffffff" class="TileImage__image___3MrIo" data-automation-id="image" aria-hidden="true">
</div><span data-automation-id="quantity" class="TileImage__quantity___1rgG4 hidden__audiblyHidden___RoAkK" role="button" aria-label="1 of N, select to change quantities">
1</span></span><div class="CartItem__name___2RJs5">
<div data-automation-id="name" tabindex="0" role="button" aria-label="N button, Select to change quantities">
Nature Valley Granola Bars Sweet and Salty Nut Cashew 6 Bars - 1.2 oz</div><span data-automation-id="list-price" class="ListPrice__listPrice___1x8TM" aria-label="2 dollars and 78 cents each">
$2.78 each</span><a class="CartItem__detailsLink___2ts9b" aria-label="Nature Valley Granola Bars Sweet and Salty Nut Cashew 6 Bars - 1.2 oz" tabindex="0" href="/ip/Nature-Valley-Granola-Bars-Sweet-and-Salty-Nut-Cashew-6-Bars---1.2-oz/10311347">
View details</a></div><span class="Price__groceryPriceContainer___19Jim CartItem__price___2ADX6" data-automation-id="price" aria-label="2 dollars and 78 cents ">
<sup class="Price__currencySymbol___3Ye7d">
$</sup><span class="Price__wholeUnits___lFhG5" data-automation-id="wholeUnits">
2</span><sup class="Price__partialUnits___1VX5w" data-automation-id="partialUnits">
78</sup></span></div><div></div></div>
我的本能(作为一个真正的初学者)是找到上面的div类部分,然后搜索aria-label并复制其后的文本,但是我觉得它确实会遇到很多麻烦,并且最终可能会如果在页面的其他位置重复该div类术语,则会产生大量的错误。
任何有关我应该如何进行的帮助(如果这是个好主意)将非常有帮助。谢谢!
答案 0 :(得分:0)
可以使用针对类属性的CSS选择器选择所有价格:
[class='Price__groceryPriceContainer___19Jim CartItem__price___2ADX6']
您将通过querySelectorAll
的{{1}}方法应用CSS选择器,该方法将返回document
。
您也可以使用以下方式获取收藏集:
nodeList
代码大纲:
.document.getElementsByClassName("Price__groceryPriceContainer___19Jim CartItem__price___2ADX6")
固定篮子项目:
牙膏:
如果购物车中的物品保持固定,并且价格随时间推移在购物篮中更新,您可以跟踪牙膏价格的变化,例如,如果您使用CSS选择器:
Option Explicit
Public Sub TEST()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.navigate "https://grocery.walmart.com/" '> Travel to homepage
.Visible = True '< Show browser window
Do While .Busy = True Or .readyState <> 4: DoEvents: Loop '< Wait for page to have loaded
Dim priceList As Object, namesList As Object, i As Long, ws As Worksheet, lastRow As Long
Set ws = ActiveSheet
'Code to get your basket ready
lastRow = GetLastRow(ws, 1)
Set priceList = .document.querySelectorAll("[class='Price__groceryPriceContainer___19Jim CartItem__price___2ADX6']") 'Select elements by their class attribute (match on basket item prices)
Set nameList = .document.querySelectorAll("[ data-automation-id='name']")
For i = 0 To priceList.Length - 1 '< Loop the nodeList of matched elements
With ws
.Cells(lastRow + i + 1, 1) = nameList.item(i).innerText '<access the name of each matched element
.Cells(lastRow + i + 1, 2) = Now
.Cells(lastRow + i + 1, 3) = priceList.item(i).innerText '<access the price of each matched element
End With
Next i
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.count, columnNumber).End(xlUp).Row
End With
End Function
所以:
.CartItem__name___2RJs5 + span
或者:
Debug.Print .document.querySelector(".CartItem__name___2RJs5 + span").innerText
最后一个使用class属性返回所有匹配元素(您的购物篮)的nodeList并按索引0访问第一个项目(牙膏):
或者您可以使用Debug.Print .document.querySelectorAll("[class='Price__groceryPriceContainer___19Jim CartItem__price___2ADX6']").item(0).innerText
方法,该方法将返回第一个匹配项,即索引0:
.querySelector
我的代码通过使用CSS选择器(页面样式)来匹配元素的class属性来定位元素。您所有的购物篮商品价格都具有类别属性Debug.Print .document.querySelector("[class='Price__groceryPriceContainer___19Jim CartItem__price___2ADX6']").innerText
。因此,我的代码向后拉了具有此类属性的元素的nodeList(有点像数组)。遍历nodeList的长度以按索引访问每个元素(从0开始)。 Price__groceryPriceContainer___19Jim CartItem__price___2ADX6
属性返回元素的文字字符串值,即价格。