我记录了一个宏,并尝试使用for循环将其与我要从中抓取数据的不同链接进行匹配。
问题在于,VBA无法将我的变量识别为链接。当我直接在代码中键入链接时,它可以工作。我不仅需要来自一个链接的数据,而且还需要来自500个链接的数据。
这是我的代码片段:
Dim Link As String
Link = "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
For i = 1 To 5
Link = Cells(i, 1)
ActiveWorkbook.Queries.Add Name:="Table 0 (3)", Formula:= _
"let" & Chr(13) & "" & Chr(10) & " Quelle = Web.Page(Web.Contents(""https://coinmarketcap.com/currencies/ontology/historical-data/""))," & Chr(13) & "" & Chr(10) & " Data0 = Quelle{0}[Data]," & Chr(13) & "" & Chr(10) & " #""Geänderter Typ"" = Table.TransformColumnTypes(Data0,{{""Date"", type date}, {""Open*"", type number}, {""High"", type number}, {""Low"", type number}, {""Close**"", type number}, {""Volume"", type number}, {""Market Cap" & _
""", type number}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & " #""Geänderter Typ"""
With ActiveSheet.ListObjects.Add(SourceType:=0, Source:= _
"OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""Table 0 (3)"";Extended Properties=""""" _
, Destination:=Range("$D$1")).QueryTable
.CommandType = xlCmdSql
.CommandText = Array("SELECT * FROM [Table 0 (3)]")
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.PreserveColumnInfo = True
.ListObject.DisplayName = "Table_0__3"
.Refresh BackgroundQuery:=False
End With
Next
一旦我更改了变量“ link”的链接(““ https://coinmarketcap.comblabla””),就会收到应用程序或对象定义的错误。当我深入研究并单击数组时,Excel告诉我导入“链接”未连接到导出。
答案 0 :(得分:0)
您可以使用下面的代码获取主要的历史数据表和上面的信息。这有点棘手,有些脆弱,因为其中很多都依赖于当前的页面样式,而页面样式可能会发生变化。历史数据位(它是实际表)更加健壮。
例如,您可以循环使用从单元格中选取的新URL,并且在每个循环开始时只需插入Sheets.Add
行,这样就可以使用新的Activesheet向其中写入数据。
下面,根据您的要求应该足以让您入门。
我得到了最高的评价:
使用
.Cells(1, 1) = IE.document.querySelector(".col-xs-6.col-sm-8.col-md-4.text-left").innerText
。这不是很可靠。文档的样式可以更改。但是,这不是页面上容易访问的部分,获取它很可能会受到当前选择的任何方法的攻击。我正在使用元素的类名("."
)通过文档的.querySelector
方法来应用CSS selector .col-xs-6.col-sm-8.col-md-4.text-left
来检索信息。与.getElementsByClassName(0)
相同。
我得到中间点:
使用
Set aNodeList = IE.document.querySelectorAll("[class*='coin-summary'] div")
这使用CSS选择器[class*='coin-summary'] div
,它们是元素中的div
标签,其className包含字符串'coin-summary'
。
该CSS选择器返回一个列表,因此使用.querySelectorAll
方法返回一个nodeLIst,然后将其遍历。
我使用table标记获得最终的历史数据(这是一个实际的表):
Set hTable = .document.getElementsByTagName("table")(0)
然后我遍历表的行和行中的单元格。
VBA:
Option Explicit
Public Sub GetInfo()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
Application.ScreenUpdating = False
With IE
.Visible = True
.navigate "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
While .Busy Or .readyState < 4: DoEvents: Wend '<== Loop until loaded
Dim hTable As HTMLTable
Set hTable = .document.getElementsByTagName("table")(0)
Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long, hBody As Object
Dim headers(), headers2()
headers = Array("Date", "Open*", "High", "Low", "Close**", "volume", "Market Cap")
headers2 = Array("Market Cap", "Volume (24h)", "Circulating Supply", "Max Supply")
With ActiveSheet
.Cells.ClearContents
.Cells(1, 1) = IE.document.querySelector(".col-xs-6.col-sm-8.col-md-4.text-left").innerText
Dim aNodeList As Object, i As Long, resumeRow As Long
Set aNodeList = IE.document.querySelectorAll("[class*='coin-summary'] div")
resumeRow = .Cells(.Rows.Count, "A").End(xlUp).Row + 2
.Range("A" & resumeRow).Resize(1, UBound(headers2) + 1) = headers2
For i = 0 To aNodeList.Length - 1
.Cells(resumeRow + 1, i + 1) = aNodeList.item(i).innerText
Next i
r = .Cells(.Rows.Count, "A").End(xlUp).Row + 2
.Cells(r, 1).Resize(1, UBound(headers) + 1) = headers
Set hBody = hTable.getElementsByTagName("tbody")
For Each tSection In hBody 'HTMLTableSection
Set tRow = tSection.getElementsByTagName("tr") 'HTMLTableRow
For Each tr In tRow
r = r + 1
Set tCell = tr.getElementsByTagName("td")
c = 1
For Each td In tCell 'DispHTMLElementCollection
.Cells(r, c).Value = td.innerText 'HTMLTableCell
c = c + 1
Next td
Next tr
Next tSection
End With
'Quit '<== Remember to quit application
Application.ScreenUpdating = True
End With
End Sub
工作表中的输出(示例):
页面中的一些示例数据:
答案 1 :(得分:0)
这将从该表中获取数据。
Option Explicit
Sub Web_Table_Option_One()
Dim xml As Object
Dim html As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
With xml
.Open "GET", "https://coinmarketcap.com/currencies/bitcoin/historical-data/", False
.send
End With
result = xml.responseText
Set html = CreateObject("htmlfile")
html.body.innerHTML = result
Set objTable = html.getElementsByTagName("Table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End Sub
您当然可以循环访问一组URL,然后遍历每个URL。这500个网址在哪里?如果它们与您提供的内容不同,则可能需要为您剪裁工作。通常,所有网站都非常不同,并且屏幕抓取是高度定制的过程。