使用VBA在Web上屏蔽的URL

时间:2018-07-01 16:18:04

标签: vba excel-vba dom web-scraping xmlhttprequest

我想在Excel中使用VBA从网站https://dps.psx.com.pk/抓取一些股票数据,但是问题是该网站的URL不变。

当我单击图像#1中突出显示的市场摘要时 Image#1

这将返回整个市场摘要,我只需要使用图像2中突出显示的VBA在Excel中抓取数据即可。 Image#2

我尝试用提琴手检查网络,如图3所示 Image#3

并在VBA中开发以下代码。

Option Explicit

Sub Test()

    ' Add references
    ' Microsoft HTML Object Library
    ' Microsoft XML, v6.0

    Dim sResp As String
    Dim rOutputCell As Range
    Dim oElememnt
    Dim cElements As IHTMLElementCollection
    Dim oTableRow
    Dim oTableCell



    ' Retrieve HTML from website
    With New MSXML2.XMLHTTP60
        ' Send request
        .Open "GET", "https://dps.psx.com.pk/webpages/mktSummary.php?r=REG", True
        Do Until .ReadyState = 4: DoEvents: Loop
        sResp = .ResponseText
    End With
    
    
    ' Parse response and output
    With New HTMLDocument
        ' Load response HTML into DOM
        .body.innerHTML = sResp
        ' Clear first worksheet for output
        ThisWorkbook.Sheets(1).Cells.Delete
        
        Set rOutputCell = Cells(3, 1)
        Set oElememnt = .getElementsByClassName("tableHead")(0)
        For Each oTableRow In oElememnt.getElementsByTagName("tr")
            For Each oTableCell In oTableRow.getElementsByTagName("td")
                rOutputCell.Value = oTableCell.innerText
                Set rOutputCell = rOutputCell.Offset(0, 1)
            Next
            Set rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)
        Next
    
      End With

    MsgBox "Completed"

End Sub

但是当我运行这段代码时,它只是显示正在运行,但是即使等待了一段时间也没有任何反应。我不知道它是否卡在事件循环中或有其他问题,请帮忙。

1 个答案:

答案 0 :(得分:3)

所有必要的信息,以抓取您可能在Fiddler请求中捕获的数据(在浏览器手册中输入引号输入后记录的日志):

fiddler

您需要通过VBA重现该请求并解析HTML响应。有一个示例显示了如何实现:

Option Explicit

Sub Test()

    ' Add references
    ' Microsoft HTML Object Library
    ' Microsoft XML, v6.0

    Dim sResp As String
    Dim rOutputCell As Range
    Dim oElememnt
    Dim cElements As IHTMLElementCollection
    Dim oTableRow
    Dim oTableCell

    ' Retrieve HTML from website
    With New MSXML2.XMLHTTP60
        ' Send request
        .Open "POST", "https://dps.psx.com.pk/webpages/SL_main_page.php", True
        .SetRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .Send "symbolCode=EFOODS"
        Do Until .ReadyState = 4: DoEvents: Loop
        sResp = .ResponseText
    End With
    ' Parse response and output
    With New HTMLDocument
        ' Load response HTML into DOM
        .body.innerHTML = sResp
        ' Clear first worksheet for output
        ThisWorkbook.Sheets(1).Cells.Delete
        ' Parse SL_cmpInfo table and output
        Set rOutputCell = ThisWorkbook.Sheets(1).Cells(1, 1)
        Set oElememnt = .getElementsByClassName("SL_cmpText")(0)
        rOutputCell.Value = oElememnt.innerText
        ' Parse SL_mktStats1 tables and output
        Set rOutputCell = Cells(3, 1)
        Set cElements = .getElementsByClassName("SL_mktStats1")
        For Each oElememnt In Array(cElements(1), cElements(2), cElements(3))
            For Each oTableRow In oElememnt.getElementsByTagName("tr")
                For Each oTableCell In oTableRow.getElementsByTagName("td")
                    rOutputCell.Value = oTableCell.innerText
                    Set rOutputCell = rOutputCell.Offset(0, 1)
                Next
                Set rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)
            Next
        Next
        ' Parse SL_announce table and output
        Set rOutputCell = rOutputCell.Offset(1, 0)
        Set oElememnt = .getElementsByClassName("SL_announce")(0)
        For Each oTableRow In oElememnt.getElementsByTagName("tr")
            For Each oTableCell In oTableRow.getElementsByTagName("td")
                rOutputCell.Value = oTableCell.innerText
                Set rOutputCell = rOutputCell.Offset(0, 1)
            Next
            Set rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)
        Next
    End With
    MsgBox "Completed"

End Sub

别忘了添加必要的参考:

refs

对我来说输出如下:

output

根据需要:

requirement