如何使用变量表示链接?

时间:2018-06-22 09:27:40

标签: excel vba web-scraping

我记录了一个宏,并尝试使用for循环将其与我要从中抓取数据的不同链接进行匹配。

问题在于,VBA无法将我的变量识别为链接。当我直接在代码中键入链接时,它可以工作。我不仅需要来自一个链接的数据,而且还需要来自500个链接的数据。

这是我的代码片段:

Dim Link As String
Link = "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
For i = 1 To 5
Link = Cells(i, 1)

     ActiveWorkbook.Queries.Add Name:="Table 0 (3)", Formula:= _
        "let" & Chr(13) & "" & Chr(10) & "    Quelle = Web.Page(Web.Contents(""https://coinmarketcap.com/currencies/ontology/historical-data/""))," & Chr(13) & "" & Chr(10) & "    Data0 = Quelle{0}[Data]," & Chr(13) & "" & Chr(10) & "    #""Geänderter Typ"" = Table.TransformColumnTypes(Data0,{{""Date"", type date}, {""Open*"", type number}, {""High"", type number}, {""Low"", type number}, {""Close**"", type number}, {""Volume"", type number}, {""Market Cap" & _
        """, type number}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & "    #""Geänderter Typ"""
    With ActiveSheet.ListObjects.Add(SourceType:=0, Source:= _
        "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""Table 0 (3)"";Extended Properties=""""" _
        , Destination:=Range("$D$1")).QueryTable
        .CommandType = xlCmdSql
        .CommandText = Array("SELECT * FROM [Table 0 (3)]")
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .PreserveColumnInfo = True
        .ListObject.DisplayName = "Table_0__3"
        .Refresh BackgroundQuery:=False
    End With
Next

一旦我更改了变量“ link”的链接(““ https://coinmarketcap.comblabla””),就会收到应用程序或对象定义的错误。当我深入研究并单击数组时,Excel告诉我导入“链接”未连接到导出。

2 个答案:

答案 0 :(得分:0)

您可以使用下面的代码获取主要的历史数据表和上面的信息。这有点棘手,有些脆弱,因为其中很多都依赖于当前的页面样式,而页面样式可能会发生变化。历史数据位(它是实际表)更加健壮。

例如,您可以循环使用从单元格中选取的新URL,并且在每个循环开始时只需插入Sheets.Add行,这样就可以使用新的Activesheet向其中写入数据。

下面,根据您的要求应该足以让您入门。


我得到了最高的评价:

top bit

使用  .Cells(1, 1) = IE.document.querySelector(".col-xs-6.col-sm-8.col-md-4.text-left").innerText。这不是很可靠。文档的样式可以更改。但是,这不是页面上容易访问的部分,获取它很可能会受到当前选择的任何方法的攻击。我正在使用元素的类名(".")通过文档的.querySelector方法来应用CSS selector .col-xs-6.col-sm-8.col-md-4.text-left来检索信息。与.getElementsByClassName(0)相同。


我得到中间点:

middle

使用

Set aNodeList = IE.document.querySelectorAll("[class*='coin-summary'] div")

这使用CSS选择器[class*='coin-summary'] div,它们是元素中的div标签,其className包含字符串'coin-summary'

该CSS选择器返回一个列表,因此使用.querySelectorAll方法返回一个nodeLIst,然后将其遍历。

List returned by CSS selector


我使用table标记获得最终的历史数据(这是一个实际的表):

Set hTable = .document.getElementsByTagName("table")(0)

然后我遍历表的行和行中的单元格。


VBA:

Option Explicit
Public Sub GetInfo()
    Dim IE As Object
    Set IE = CreateObject("InternetExplorer.Application")
    Application.ScreenUpdating = False
    With IE
        .Visible = True
        .navigate "https://coinmarketcap.com/currencies/bitcoin/historical-data/"

        While .Busy Or .readyState < 4: DoEvents: Wend '<== Loop until loaded

        Dim hTable As HTMLTable
        Set hTable = .document.getElementsByTagName("table")(0)

        Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long, hBody As Object
        Dim headers(), headers2()
        headers = Array("Date", "Open*", "High", "Low", "Close**", "volume", "Market Cap")
        headers2 = Array("Market Cap", "Volume (24h)", "Circulating Supply", "Max Supply")

        With ActiveSheet
            .Cells.ClearContents
            .Cells(1, 1) = IE.document.querySelector(".col-xs-6.col-sm-8.col-md-4.text-left").innerText
            Dim aNodeList As Object, i As Long, resumeRow As Long
            Set aNodeList = IE.document.querySelectorAll("[class*='coin-summary'] div")
            resumeRow = .Cells(.Rows.Count, "A").End(xlUp).Row + 2
            .Range("A" & resumeRow).Resize(1, UBound(headers2) + 1) = headers2

            For i = 0 To aNodeList.Length - 1
                .Cells(resumeRow + 1, i + 1) = aNodeList.item(i).innerText
            Next i

            r = .Cells(.Rows.Count, "A").End(xlUp).Row + 2

            .Cells(r, 1).Resize(1, UBound(headers) + 1) = headers
            Set hBody = hTable.getElementsByTagName("tbody")
            For Each tSection In hBody           'HTMLTableSection
                Set tRow = tSection.getElementsByTagName("tr") 'HTMLTableRow
                For Each tr In tRow
                    r = r + 1
                    Set tCell = tr.getElementsByTagName("td")
                    c = 1
                    For Each td In tCell         'DispHTMLElementCollection
                        .Cells(r, c).Value = td.innerText 'HTMLTableCell
                        c = c + 1
                    Next td

                Next tr
            Next tSection


        End With

        'Quit '<== Remember to quit application
        Application.ScreenUpdating = True
    End With
End Sub

工作表中的输出(示例):

Example output


页面中的一些示例数据:

Example data

答案 1 :(得分:0)

这将从该表中获取数据。

Option Explicit
Sub Web_Table_Option_One()
    Dim xml    As Object
    Dim html   As Object
    Dim objTable As Object
    Dim result As String
    Dim lRow As Long
    Dim lngTable As Long
    Dim lngRow As Long
    Dim lngCol As Long
    Dim ActRw As Long
    Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
    With xml
        .Open "GET", "https://coinmarketcap.com/currencies/bitcoin/historical-data/", False
        .send
    End With
    result = xml.responseText
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = result
    Set objTable = html.getElementsByTagName("Table")
    For lngTable = 0 To objTable.Length - 1
        For lngRow = 0 To objTable(lngTable).Rows.Length - 1
            For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
                ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
            Next lngCol
        Next lngRow
        ActRw = ActRw + objTable(lngTable).Rows.Length + 1
    Next lngTable
End Sub

您当然可以循环访问一组URL,然后遍历每个URL。这500个网址在哪里?如果它们与您提供的内容不同,则可能需要为您剪裁工作。通常,所有网站都非常不同,并且屏幕抓取是高度定制的过程。