将数据从网站刮到Excel

时间:2017-02-05 12:19:36

标签: excel vba internet-explorer automation

我使用vba导航到一个站点,然后到达我需要使用表单的位置。我需要vba从HTML中获取一些数据并将其放入Excel工作表中。如何将此数据废弃到Excel中?

以下是该页面的一篇文章:

<div id="ctl00_ContentPlaceHolder1_gridDebitosUsuario1_pnlListaContas">


            <table width="800px" border="0">
                <tr>

                    <td id="ctl00_ContentPlaceHolder1_gridDebitosUsuario1_rptContasAberto_ctl00_tdRadioButtonHeader" width="35px" class="td_titulo"></td>

                    <td width="130px" class="td_titulo">Nº Conta Energia
                    </td>
                    <td width="100px" class="td_titulo">Descrição Fatura
                    </td>
                    <td width="80px" class="td_titulo">Mês Ref.
                    </td>
                    <td width="100px" class="td_titulo">Vencimento
                    </td>

                    <td width="100px" class="td_titulo">Valor
                    </td>
                    <td id="ctl00_ContentPlaceHolder1_gridDebitosUsuario1_rptContasAberto_ctl00_tdCodBarrasHeader" width="200px" class="td_titulo">Código de Barras
                    </td>


                </tr>


            <tr id="ctl00_ContentPlaceHolder1_gridDebitosUsuario1_rptContasAberto_ctl01_linha">
            <td id="ctl00_ContentPlaceHolder1_gridDebitosUsuario1_rptContasAberto_ctl01_tdRadioButtonItem" width="35px" class="td_branco">
                    <input id="ctl00_ContentPlaceHolder1_gridDebitosUsuario1_rptContasAberto_ctl01_rbConta" type="radio" name="ctl00$ContentPlaceHolder1$gridDebitosUsuario1$rptContasAberto$ctl01$rbConta" value="0201701001618299" onclick="SetUniqueRadioButton(&#39;rptContasAberto.*rbConta&#39;,this);" />
                </td>
            <td width="130px" class="td_branco">
                    <span id="ctl00_ContentPlaceHolder1_gridDebitosUsuario1_rptContasAberto_ctl01_lblNumeroConta">0201701001618299</span>

我正在尝试下面的代码,但它不起作用:

Set xobj = objIe.Document.getElementById("ctl00_ContentPlaceHolder1_gridDebitosUsuario1_pnlListaContas")
    Set xobj = xobj.getElementsById("ctl00_ContentPlaceHolder1_gridDebitosUsuario1_rptContasAberto_ctl00_tdRadioButtonHeader")
    Set xobj = xobj.getElementsByClassName("ctl00_ContentPlaceHolder1_gridDebitosUsuario1_rptContasAberto_ctl01_lblNumeroConta")(0)

    MsgBox xobj.innerText

2 个答案:

答案 0 :(得分:1)

由于您要求使用Excel解决方案,因此无法回答您的问题,但当我需要抓取网站并将结果导入Excel时,我使用的是Web Scraper,这是Chrome的扩展程序。它在开始时有点烦人,因为它不是一个直观的扩展,并且帮助非常有限,但是一旦掌握了它,它就能很好地工作。刮擦的结果可以导出为CSV格式。

http://webscraper.io/

答案 1 :(得分:1)

我认为这是最简单的方法。

Sub DumpData()

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

URL = "http://finance.yahoo.com/q?s=sbux&ql=1"

'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
   DoEvents
Loop

RowCount = 1

With Sheets("Sheet1")
   .Cells.ClearContents
   RowCount = 1
   For Each itm In IE.document.all
      .Range("A" & RowCount) = itm.tagname
      .Range("B" & RowCount) = itm.ID
      .Range("C" & RowCount) = itm.classname
      .Range("D" & RowCount) = Left(itm.innertext, 1024)

      RowCount = RowCount + 1
   Next itm
End With
End Sub

运行该脚本,您应该拥有所需的一切。

谢谢乔尔!