我目前正在尝试从Morningstar的一张表中抓取某些数据,然后将其循环到下一个代码,并重复进行直到没有更多的代码为止。
当前,它将拉动“追踪总回报”表上的整个“类别内排名”行。我只是想拉3个月,6个月,年初至今,1年,3年和5年。完成这些操作后,它将循环到导航行中“ Cells(p,14)”所确定的下一个股票行情。
即。它检测到“ LINKX”位于第1、14单元格中,因此导航到http://performance.morningstar.com/fund/performance-return.action?t=LINKX®ion=usa&culture=en_US,并从“跟踪总收益”表中拉出所有“类别等级”行。我只希望将指定的那些放入指定的单元格位置,然后循环到下一个自动收录器。
我已经使用excel VBA浏览了许多这些线程,我试图从某个行情指示器页面中提取关键的特定信息,然后循环到下一个行情指示器并重复。
Declare PtrSafe Function apiShowWindow Lib "user32" Alias "ShowWindow" _
(ByVal hwnd As LongPtr, ByVal nCmdShow As LongPtr) As LongPtr
Global Const SW_MAXIMIZE = 3
Global Const SW_SHOWNORMAL = 1
Global Const SW_SHOWMINIMIZED = 2
Sub LinkedInWebScrapeScript()
Dim objIE As InternetExplorer
Dim html As HTMLDocument
Set objIE = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
objIE.Visible = 1
Dim p As Integer
p = 3
objIE.navigate ("http://performance.morningstar.com/fund/performance-return.action?t=" & Cells(p, 14) & "®ion=usa&culture=en_US")
Application.Wait Now + #12:00:02 AM#
While objIE.Busy
DoEvents
Wend
apiShowWindow objIE.hwnd, SW_MAXIMIZE
For i = 1 To 2
objIE.document.parentWindow.scrollBy 0, 100000 & i
Application.Wait Now + #12:00:01 AM#
Next i
Dim TDelements As IHTMLElementCollection
Dim htmldoc As MSHTML.IHTMLDocument 'Document object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleColtd1 As MSHTML.IHTMLElementCollection
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Set htmldoc = objIE.document 'Document webpage
Set eleColtr = htmldoc.getElementsByTagName("tr") 'Find all tr tags
Set TDelements = htmldoc.getElementsByTagName("table")
'This section populates Excel
i = 0 'start with first value in tr collection
Set eleColtd = htmldoc.getElementsByClassName("r_table3 width955px print97")(0).getElementsByClassName("last")(0).getElementsByClassName("row_data divide") 'get all the td elements in that specific tr
For Each eleCol In eleColtd 'for each element in the td collection
Sheets("Sheet2").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
j = j + 1 'move to next element in td collection
Next eleCol 'rinse and repeat
i = i + 1
p = p + 1
objIE.navigate ("http://performance.morningstar.com/fund/performance-return.action?t=" & Cells(p, 14) & "®ion=usa&culture=en_US")
Set eleColtd = htmldoc.getElementsByClassName("r_table3 width955px print97")(0).getElementsByClassName("last")(0).getElementsByClassName("row_data divide") 'get all the td elements in that specific tr
For Each eleCol In eleColtd 'for each element in the td collection
Sheets("Sheet2").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
z = z + 1
j = j + 1 'move to next element in td collection
Next eleCol 'rinse and repeat
End Sub
这将拉动“追踪总回报”表上的整个“类别中的排名”行。我只是想拉3个月,6个月,年初至今,1年,3年和5年。完成这些操作后,它将循环到导航行中“ Cells(p,14)”所确定的下一个股票行情。
答案 0 :(得分:2)
以下内容显示了一个循环以及如何选择适当的表,然后使用css selectors来填充表单元格。代码从第1行的第N列读取到一个数组中。它假设范围内没有空白单元格(尽管可以确定要添加一个测试)。
在数组上有一个循环,其中包含每个报价器,URL中的TICKER占位符将替换为当前报价器值。
在“每月显示”标签上单击一行。
通过以下行标识适当的行
Set rankings = .querySelectorAll("#tab-month-end-content .last td")
#tab-month-end-content
是一个ID选择器,它会在右侧显示标签,然后.last
是最后一个tbody
(即last
)的类名的类选择器,然后td
用于指定该tbody中的子td
单元。
CSS选择器:
现代浏览器针对CSS进行了优化。 CSS选择器是匹配html文档中元素的一种快速方法。 CSS选择器是通过querySelector或querySelectorAll
方法应用的;在这种情况下,为HTMLDocument
(即文档)。 querySelector
返回一个节点:css选择器的第一个匹配项; querySelectorAll
返回所有匹配项的nodeList-然后索引该nodeList以获取特定项,例如第二个td单元格位于索引1。
查看我们指定的模式:
#tab-month-end-content .last td
第一部分是id selector,#
,它通过ID选择一个元素
#tab-month-end-content
应用于页面时,这将返回两个匹配项,我们需要第二个匹配项
点击图片放大
下一部分
.last
是类名.
的{{3}},last
。这将选择上图所示的tbody
标签子元素。由于只有第二个id匹配的元素有这个孩子,所以我们现在使用正确的父元素继续工作,并使用class selector选择td
类型的元素
td
上述每个部分之间的空白被称为type selector,如果它们的祖先元素与第一个选择器匹配,则它们指定选择第二个选择器匹配的元素,即左边的选择器是右边的相邻css选择器检索到的与选择器匹配的元素的父级。
我们可以在下一张图片中看到它:
点击图片放大
VBA:
Option Explicit
Public Sub GetData()
Dim ie As Object, tickers(), ws As Worksheet, lastRow As Long
Dim results(), headers(), r As Long, i As Long, url As String
headers = Array("ticker", "3m", "6m", "ytd", "1y", "3y", "6y")
Set ws = ThisWorkbook.Worksheets("Sheet1")
tickers = Application.Transpose(ws.Range("N1:N" & GetLastRow(ws, 14)).Value)
ReDim results(1 To UBound(tickers), 1 To UBound(headers) + 1)
Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
With ie
.Visible = True
For i = LBound(tickers) To UBound(tickers)
r = r + 1
url = Replace$("http://performance.morningstar.com/fund/performance-return.action?t=TICKER®ion=usa&culture=en_US", "TICKER", tickers(i))
.Navigate2 url
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("[tabname='#tabmonth']").Click
Dim rankings As Object
Do
Loop While .document.querySelectorAll("#tab-month-end-content .last td").Length = 0 'could add timed loop here
With .document
Set rankings = .querySelectorAll("#tab-month-end-content .last td")
On Error Resume Next
results(r, 1) = tickers(i)
results(r, 2) = rankings.item(1).innerText
results(r, 3) = rankings.item(2).innerText
results(r, 4) = rankings.item(3).innerText
results(r, 5) = rankings.item(4).innerText
results(r, 6) = rankings.item(5).innerText
results(r, 7) = rankings.item(6).innerText
On Error GoTo 0
End With
Set rankings = Nothing
Next
ws.Cells(1, 15).Resize(UBound(results, 1), UBound(results, 2)) = results
.Quit
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.rows.Count, columnNumber).End(xlUp).Row
End With
End Function
如@SIM所述,您可以使用并避免使用浏览器,尽管不确定您的安全设置是否需要将网站列入白名单。您需要在以下网址中查询占位符是否有效:XNAS:TICKER
。 XNAS
前缀可能会随您的代码而有所不同,在这种情况下,您将需要适当的字符串(包括N列中的前缀),然后将扩展名占位符替换为例如。 ..... =PLACEHOLDER®ion
..
Option Explicit
Public Sub GetData()
Dim tickers(), ws As Worksheet, lastRow As Long
Dim results(), headers(), r As Long, i As Long, url As String, html As HTMLDocument
Set html = New HTMLDocument 'vbe > tools > references > Microsoft HTML Object Library
headers = Array("ticker", "3m", "6m", "ytd", "1y", "3y", "6y")
Set ws = ThisWorkbook.Worksheets("Sheet1")
tickers = Application.Transpose(ws.Range("N1:N" & GetLastRow(ws, 14)).Value)
ReDim results(1 To UBound(tickers), 1 To UBound(headers) + 1)
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(tickers) To UBound(tickers)
r = r + 1
url = Replace$("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:TICKER®ion=usa&culture=en-US&cur=&ops=clear&s=0P0000J533&ndec=2&ep=true&align=m&annlz=true&comparisonRemove=false&loccat=&taxadj=&benchmarkSecId=&benchmarktype=", "TICKER", tickers(i))
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.setRequestHeader "DNT", "1"
.send
html.body.innerHTML = .responseText
Dim rankings As Object
With html
Set rankings = .querySelectorAll(".last td")
On Error Resume Next
results(r, 1) = tickers(i)
results(r, 2) = rankings.item(1).innerText
results(r, 3) = rankings.item(2).innerText
results(r, 4) = rankings.item(3).innerText
results(r, 5) = rankings.item(4).innerText
results(r, 6) = rankings.item(5).innerText
results(r, 7) = rankings.item(6).innerText
On Error GoTo 0
End With
Set rankings = Nothing
Next
ws.Cells(1, 15).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.rows.Count, columnNumber).End(xlUp).Row
End With
End Function