从网站提取数据到.csv文件

时间:2020-10-06 03:38:34

标签: excel vba csv web-scraping

自从一个星期以来,我一直在尝试在这里找到的每个示例(甚至试图将示例从JavaScript转换为VBA),但不能动用对我想要的网站有用的东西

网站:https://www.inspq.qc.ca/covid-19/donnees

手动地,我可以单击每个图表右上方的3个点,然后单击“ CSV格式的收费文件”选项,然后将图表的原始数据保存到.csv文件中

当我检查元素时看到:

<g class="highcharts-button highcharts-contextbutton                 highcharts-button-normal" stroke-linecap="round" transform="translate(1412,10)"><rect fill="#ffffff" class="highcharts-button-box" x="0.5" y="0.5" width="24" height="22" rx="2" ry="2" stroke="none" stroke-width="1"></rect><title>Chart context menu</title><path fill="#666666" d="M 12.666666666666666 6.666666666666668 A 1.3333333333333335 1.3333333333333335 0 1 1 12.667999999777779 6.666666000000056 Z M 12.666666666666666 13.333333333333336 A 1.3333333333333335 1.3333333333333335 0 1 1 12.667999999777779 13.333332666666724 Z M 12.666666666666666 20 A 1.3333333333333335 1.3333333333333335 0 1 1 12.667999999777779 19.999999333333392 Z" class="highcharts-button-symbol" data-z-index="1" stroke="#666666" stroke-width="3"></path><text x="0" data-z-index="1" style="color:#333333;cursor:pointer;font-weight:normal;fill:#333333;" y="12"></text></g>

但是我不知道如何更改在此找到的每个示例,然后单击此按钮并选择将原始数据导出到CSV文件的选项

1 个答案:

答案 0 :(得分:0)

我们与JS无关,因为IE会自动为我们完成。我们“仅”必须自动化IE。

要管理下载,我们必须使用Sendkeys()。这不是很优雅,但是我不知道像theese这样的其他用于重载的解决方案。因此,必须显示IE,并且必须具有焦点!

文件将保存在IE的标准下载目录中。它们都具有相同的名称​​ chart.csv 。但这没问题,因为Windows会在每个文件名中添加一个自己的数字。所以没有编号的文件是第一个。其他所有编号均按以下顺序编号。

我已注释了代码。这样您就可以了解其工作原理:

Sub DownloadCovidCSVs()

Const url As String = "https://www.inspq.qc.ca/covid-19/donnees"

Dim ie As Object
Dim nodesAllThreeDots As Object
Dim nodeOneThreeDots As Object
Dim nodeMenueEntries As Object
Dim timeout As Double
Dim failed As Boolean

  'Initialize Internet Explorer, set visibility,
  'call URL and wait until page is fully loaded
  Set ie = CreateObject("internetexplorer.application")
  ie.Visible = True
  ie.navigate url
  Do Until ie.ReadyState = 4: DoEvents: Loop
  
  'Start time for timeout if the charts would'nt load
  timeout = Timer
  'Waiting until the charts were loaded or timeout takes effect
  Do
    Set nodesAllThreeDots = ie.document.getElementsByClassName("highcharts-contextbutton")
  Loop Until nodesAllThreeDots.Length > 0 Or Timer - timeout > 30 'Timeout in seconds
  
  'Check if charts not been loaded
  If nodesAllThreeDots.Length = 0 Then
    failed = True
  End If
  
  'If charts were loaded, download csv files
  If Not failed Then
    For Each nodeOneThreeDots In nodesAllThreeDots
      'Open current menu
      nodeOneThreeDots.Click
      
      'Get the last entry and click it to initialize download
      Set nodeMenueEntries = ie.document.getElementsByClassName("highcharts-menu-item")
      nodeMenueEntries(nodeMenueEntries.Length - 1).Click
      
      'Give the server time to generate the document and the IE to show the download button at it's bottom
      Application.Wait (Now + TimeValue("0:00:03"))
      
      'Attention!
      'We have to use SendKeys to perform the download!!!
      'As long as the macro is running, keep your fingers away from the mouse and keyboard!!!
      Application.SendKeys ("%{S}")
    Next nodeOneThreeDots
  End If
  
  'Clean up
  ie.Quit
  Set nodesAllThreeDots = Nothing
  Set nodeOneThreeDots = Nothing
  Set nodeMenueEntries = Nothing
End Sub