我使用下面提到的代码进行数据提取,但由于li clear tag无法使用access vba复制完整数据。有关绕过clear tag所需的注意事项。我的代码如下所示。
Set my_data = html4.getElementsByClassName("right_box")
For Each Item In my_data
Set my_data1 = Item.getElementsByTagName("li")
For Each item1 In my_data1
If item1.innerHTML Like "*href*" Then
href11 = item1.getElementsByTagName("a")
Else
Exit For
End If
下面给出了HTML数据代码。
<div class="right_box"> <div class="right_box_title"> <div class="title_left"></div> <a class="title_right" href="products.php?disp=1"></a> </div> <ul class="pro_list">
<li> <a title="NEW Handbags Handbags7" href="/index.php/NEW-Handbags-Handbags72-p20253745.html" class="pic"><img title="NEW Handbags Handbags7" alt="NEW Handbags Handbags7" src="/image.php?pic=2017-08-27%2F2017082722393889955047.jpg&style=1&folder=uploadImage%2F" border="0" /></a>
</li>
<li class="clear"></li>
<li>
<a title="NEW Handbags Handbags6" href="/index.php/NEW-Handbags-Handbags6-p2025361.html" class="pic"><img title="NEW Handbags Handbags6" alt="NEW Handbags Handbags6" src="/image.php?pic=2017-08-27%2F201708272239272285106.jpg&style=1&folder=uploadImage%2F" border="0" /></a>
</li>
上面的代码以明确的方式停止了数据
答案 0 :(得分:1)
CSS类不应该阻止您收集数据。
注意:您应该设置对 Microsoft HTML对象库的引用。
由于item1.getElementsByTagName("a")
返回一个对象而不是一个标量值,这一行↓代码↓应该失败。
href11 = item1.getElementsByTagName(&#34; a&#34;)
这是迭代锚标记的更好模式:
Dim a As HTMLAnchorElement
Set my_data = html4.getElementsByClassName("right_box")(0)
For Each a In my_data.getElementsByTagName("a")
Debug.Print a.href
Next
答案 1 :(得分:0)
使用clear class删除此li并使用其他li中的css管理设计。
答案 2 :(得分:0)
在这种情况下,您可以使用querySelectorAll
并传递适当的选择器,例如div[class='right_box'] ul[class='pro_list'] li a
,选择a
内li
内的所有ul with class pro_list
div with class right_box
}}。有关选择器的更多信息,请参阅例如https://secure.php.net/manual/en/function.explode.php。 HTH
Set html4 = ie.document
Dim selector As String
selector = "div[class='right_box'] ul[class='pro_list'] li a"
Dim anchors As IHTMLDOMChildrenCollection
Set anchors = html4.querySelectorAll(selector)
Dim anchor, i
If Not anchors Is Nothing Then
For i = 0 To anchors.Length - 1
Set anchor = anchors.Item(i)
Debug.Print "anchor-" & i & " href: " & anchor.href
Next i
End If
输出:
anchor-0 href: file:///C:/index.php/NEW-Handbags-Handbags72-p20253745.html
anchor-1 href: file:///C:/index.php/NEW-Handbags-Handbags6-p2025361.html