VBA HTML网页提取不匹配

时间:2018-12-18 02:29:39

标签: excel vba excel-vba web-scraping

我得到了一段代码,该代码应该可以在eBay上获取商品的商品清单和定价。它似乎在大多数情况下都起作用,除了价格存在一些不匹配(价格多于清单)。为什么会发生这种情况?

Public IE As New SHDocVw.InternetExplorer

Sub GetData()

Dim HTMLdoc As MSHTml.HTMLDocument
Dim othwb As Variant
Dim objShellWindows As New SHDocVw.ShellWindows

Set IE = CreateObject("internetexplorer.application")

    With IE
        .Visible = False
        .Navigate "https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=brooks+brothers&_sacat=1059&LH_TitleDesc=0&_osacat=1059&_odkw=brooks+brothers&LH_TitleDesc=0"
        While .Busy Or .ReadyState <> 4: DoEvents: Wend


            Set HTMLdoc = IE.Document
            ProcessHTMLPage HTMLdoc

        .Quit
    End With


End Sub

Sub ProcessHTMLPage(HTMLPage As MSHTml.HTMLDocument)

Dim HTMLItem As MSHTml.IHTMLElement
Dim HTMLItems As MSHTml.IHTMLElementCollection
Dim HTMLInput As MSHTml.IHTMLElement
Dim rownum As Long

rownum = 1

Set HTMLItems = HTMLPage.getElementsByClassName("s-item__title")

For Each HTMLItem In HTMLItems

        Cells(rownum, 1).Value = HTMLItem.innerText
        rownum = rownum + 1

Next HTMLItem

rownum = 1

Set HTMLItems = HTMLPage.getElementsByClassName("s-item__price")

For Each HTMLItem In HTMLItems

        Cells(rownum, 2).Value = HTMLItem.innerText
        rownum = rownum + 1

Next HTMLItem


End Sub

1 个答案:

答案 0 :(得分:1)

首先,更改选择器以将其限制为列表的主要部分,以避免最近查看的项目。然后,您可以一张一张地处理清单。在下面的示例中,我将所有列出的价格(不包括删除线)捕获到一个数组中,并与相关的标题存储在一个集合中。您可以import QtQuick 2.10 import QtQuick.Controls 2.3 ApplicationWindow { visible: true width: 640 height: 480 property real spotlightRadius: 100 MouseArea { visible: true anchors.fill: parent onClicked: { spotlightComponent.createObject(parent, { "x": x + mouseX - spotlightRadius, "y": y + mouseY - spotlightRadius, "width": spotlightRadius * 2, "height": spotlightRadius * 2 }) } } Component { id: spotlightComponent Rectangle { id: spotlightCircle visible: true x: parent.x y: parent.y width: parent.width height: parent.height radius: Math.max(parent.width, parent.height) / 2 color: Qt.rgba(Math.random()*0.5+0.5,Math.random()*0.5+0.5,Math.random()*0.5+0.5,0.5); Item { anchors.fill: parent onDoubleClicked: parent.destroy() onWheel: { parent.z += wheel.pixelDelta.y; currentSpotlight = parent } signal clicked(var mouse) signal pressed(var mouse) signal doubleClicked(var mouse) signal wheel(var wheel) property alias drag: mouseArea.drag property bool containsMouse: { var x1 = width / 2; var y1 = height / 2; var x2 = mouseArea.mouseX; var y2 = mouseArea.mouseY; var deltax = x1 - x2; var deltay = y1 - y2; var distance2 = deltax * deltax + deltay * deltay; var radius2 = Math.pow(Math.min(width, height) / 2, 2); return distance2 < radius2; } MouseArea { id: mouseArea anchors.fill: parent hoverEnabled: true drag.target: spotlightCircle onPressed: { if (parent.containsMouse) { parent.pressed(mouse) } else { mouse.accepted = false } } onClicked: { if (parent.containsMouse) { parent.clicked(mouse) } else { mouse.accepted = false } } onDoubleClicked: { if (containsMouse2) { parent.doubleClicked(mouse) } } onWheel: { if (parent.containsMouse) { parent.wheel(wheel) } } } } } } } 数组尺寸或简单地提取已借项以获取第一个价格。 价格

redim preserve