无法使用vba从顽固的网页中提取数据

时间:2017-02-06 14:14:46

标签: vba web-scraping

希望你做得好。我试图从中删除类别名称的网站非常简单,如果你注意到它的被检查元素,但是当我创建一个解析器时,我无法提取数据。我只想从该页面中删除7个类别名称。我尝试了所有可能的角度但失败了。如果有人帮助我指出我做错了什么,我会非常感激他。提前致谢。 FYC,我在这里粘贴了我试过的代码。

func addImageAtRadomPosition() {
    let imageView = UIImageView()
    imageView.backgroundColor = UIColor.red
    imageView.translatesAutoresizingMaskIntoConstraints = false
    self.view.addSubview(imageView)

    let imageSize: CGFloat = 50.0
    let screenHeight = UIScreen.main.bounds.size.height
    let screenWidth = UIScreen.main.bounds.size.width

    // Get random values which will be used for top and left constraints
    let randomYPosition = Int(arc4random_uniform(UInt32(screenHeight - imageSize)))
    let randomXPosition = Int(arc4random_uniform(UInt32(screenWidth - imageSize)))

    let topConstraint = NSLayoutConstraint(item: imageView, attribute: .top, relatedBy: .equal, toItem: self.topLayoutGuide, attribute: .bottom, multiplier: 1, constant: CGFloat(randomYPosition))
    let leftContraint = NSLayoutConstraint(item: imageView, attribute: .left, relatedBy: .equal, toItem: self.view, attribute: .left, multiplier: 1, constant: CGFloat(randomXPosition))
    let widthConstraint = NSLayoutConstraint(item: imageView, attribute: NSLayoutAttribute.width, relatedBy: NSLayoutRelation.equal, toItem: nil, attribute: NSLayoutAttribute.notAnAttribute, multiplier: 1, constant: imageSize)
    let heightConstraint = NSLayoutConstraint(item: imageView, attribute: NSLayoutAttribute.height, relatedBy: NSLayoutRelation.equal, toItem: nil, attribute: NSLayoutAttribute.notAnAttribute, multiplier: 1, constant: imageSize)

    view.addConstraints([topConstraint, leftContraint, widthConstraint, heightConstraint])
}

2 个答案:

答案 0 :(得分:2)

这是一个可能的解决方案,我使用的是Internet Explorer对象而不是MSXML。我能够从页面中检索数据,而且速度非常快。

这是完整的代码:

Option Explicit

#If VBA7 Then
    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr)
#Else
    Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
#End If

Sub ItemName()
On Error GoTo errhand:
    Dim ie As Object: Set ie = CreateObject("InternetExplorer.Application")
    Dim topics As Object, topic As Object
    Dim i As Byte

    With ie
        .Visible = False
        .Navigate "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193"
        Sleep 500 ' Wait for the page to start loading
        Do Until .document.readyState = 4 Or .busy = False Or i >= 100
            Sleep 100
            DoEvents
            i = i + 1
        Loop
    End With

    Set topics = ie.document.getElementsByClassName("name ng-binding")

    For Each topic In topics
        'Print out the element's innertext
        Debug.Print topic.innertext
    Next

    ie.Quit
    Set ie = Nothing
    Exit Sub

errhand:
    Debug.Print Err.Number, Err.Description
    ie.Quit
    Set ie = Nothing
End Sub

答案 1 :(得分:0)

由于该站点的内容是动态生成的,因此xmlhttp请求无法捕获页面源。然而,要解决这个硒是好事,因为它在处理javascriptheavy网站时效果很好。我只在下面的脚本中使用了selenium来获取页面源代码。一旦它得到了,我就恢复到通常的vba方法来完成这个过程。

Sub Grabbing_item()
    Dim driver As New ChromeDriver, html As New HTMLDocument
    Dim post As Object

    With driver
        .get "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193"
        html.body.innerHTML = .ExecuteScript("return document.body.innerHTML;")
        .Quit
    End With

    For Each post In html.getElementsByClassName("name")
        x = x + 1: Cells(x, 1) = post.innerText
    Next post
End Sub