提取动态内容Web抓取

时间:2020-06-03 17:36:09

标签: excel vba web web-scraping

我知道XMLHTTP仅获取初始页面源,它​​不会执行任何动态更新。我不想因为速度太慢而尝试自动化IE。

我已附上以下代码。我想提取该股票在BSE和NSE中的数量。 但是只有单击“查看NSE”才能提取NSE量。 在将NSE卷提取为

时出现错误

“未设置对象变量”

请为我提供解决方案,我是XHR,JSON等的新手。

Sub PV_Extract()

    Dim wpage As New MSXML2.ServerXMLHTTP60
    Dim hdoc As New HTMLDocument


        URL = "https://money.rediff.com/companies/Asian-Paints-Ltd/11580001"
        wpage.Open "GET", URL, False
        wpage.send

        While wpage.readyState <> 4
            DoEvents
        Wend

        Set hdoc = New HTMLDocument

        hdoc.body.innerHTML = wpage.responseText

        Set today_tab_bse = hdoc.getElementsByTagName("table")(0).getElementsByTagName("tr")(1)
        Set today_tab_nse = hdoc.getElementsByTagName("table")(1).getElementsByTagName("tr")(1)

        vol_1 = today_tab_bse.getElementsByTagName("td")(0).innerText
        vol_2 = today_tab_nse.getElementsByTagName("td")(0).innerText

End Sub

1 个答案:

答案 0 :(得分:0)

始终声明所有变量。您可以按其ID刮取元素。

Sub PV_Extract()

Dim wpage As New MSXML2.ServerXMLHTTP60
Dim url As String
Dim hdoc As New HTMLDocument
Dim today_tab_bse As Object
Dim today_tab_nse As Object
Dim vol_1 As String
Dim vol_2 As String

  url = "https://money.rediff.com/companies/Asian-Paints-Ltd/11580001"
  wpage.Open "GET", url, False
  wpage.send
  Set hdoc = New HTMLDocument
  hdoc.body.innerHTML = wpage.responseText

  Set today_tab_bse = hdoc.getElementByID("for_BSE")
  Set today_tab_nse = hdoc.getElementByID("for_NSE")

  vol_1 = today_tab_bse.getElementsByTagName("td")(0).innerText
  vol_2 = today_tab_nse.getElementsByTagName("td")(0).innerText

  MsgBox "BSE: " & vol_1 & Chr(13) & "NSE: " & vol_2
End Sub

编辑:获取BSE和NSE的值

Sub PV_Extract()

Dim wpage As New MSXML2.ServerXMLHTTP60
Dim url As String
Dim hdoc As New HTMLDocument
Dim vol_1 As String
Dim vol_2 As String

  url = "https://money.rediff.com/companies/Asian-Paints-Ltd/11580001"
  wpage.Open "GET", url, False
  wpage.send
  Set hdoc = New HTMLDocument
  hdoc.body.innerHTML = wpage.responseText

  vol_1 = hdoc.getElementByID("ltpid").innerText
  vol_2 = hdoc.getElementByID("ltpid_nse").innerText

  MsgBox "BSE: " & vol_1 & Chr(13) & "NSE: " & vol_2
End Sub