我正在尝试将网站中的一些表格项粘贴到Excel中。
尽管我在Excel方面对VBA还是很陌生,但我通常对编码并不陌生:)
我尝试使用Excel的数据>从Web界面,它无法识别表格。我猜这是因为它是使用(或至少是我的Google-Fu使我理解的)构建的。
Snipping of what the second table looks like
<html>
<frame title="links" ...>...</frame>
<frame title="queue">
#document
<head>...</head>
<body>
<div id="container>
<script>...</script>
<div>
<table id="oTable">
<colgroup>...</colgroup>
<thead>...</thead>
<tbody>
<tr onclick="changeHighlight( 'eid0' )" id="eid0" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.5599976.5599976');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">12345</a></td>
<td nowrap=""><a href="`" target="_Blank">28/08/2018 17:00:49</a></td>
<td nowrap=""><a href="URL" target="_Blank">11/09/2018 16:28:39</a></td>
<td nowrap=""><a href="URL" target="_Blank">5,599,976</a></td>
<td nowrap=""><a href="URL" target="_Blank">dijm</a></td></tr>
<tr onclick="changeHighlight( 'eid1' )" id="eid1" class="queryunshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443276.6443276');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443276.6443276','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">67890</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:01</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:32:32</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,276</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
<tr onclick="changeHighlight( 'eid2' )" id="eid2" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443287.6443287');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443287.6443287','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">23456</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:24</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:35:30</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,287</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
<tr onclick="changeHighlight( 'eid3' )" id="eid3" class="queryunshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443339.6443339');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443339.6443339','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">78901</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:06:02</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:40:39</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,339</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
<tr onclick="changeHighlight( 'eid4' )" id="eid4" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443344.6443344');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443344.6443344','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">34567</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:06:17</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:40:43</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,344</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
我尝试了各种看起来像这样的解决方案: https://www.ozgrid.com/forum/forum/other-software-applications/excel-and-web-browsers-help/131683-extracting-data-from-a-grid-on-webpage 和 Scraping data from website using vba
并尝试定义框架本身以尝试从中获取信息? (再次:是Excel VBA的新功能)
'set myHTMLDoc to the main pages IE document
Dim myHTMLDoc As HTMLDocument
Set myHTMLDoc = ie.Document
'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
Dim myHTMLFrame2 As HTMLDocument
Set myHTMLFrame2 = myHTMLDoc.Frames(1).Document
使用上面的代码块,我得到了“运行时错误'438' 没有上面的块,我会收到“运行时错误'1004'
我最终想要的信息在每一行中:
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">67890</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:01</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:32:32</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,276</a></td>
理想情况下,我想将每个元素转储到单元格中
67890 | 25/06/2019 11:01:01 | 09/09/2019 10:32:32 | 6,443,276
每页上有20行(有一个按钮可以按下以转到下一页,稍后我会弄清楚...希望是哈哈)
大量提示,感谢任何可以帮助您的人:)
-编辑- 这是我目前正在使用的代码(对此并不珍贵:P)
Private Sub CommandButton1_Click()
Dim ie As Object
Dim html As Object
Dim objElementTR As Object
Dim objTR As Object
Dim objElementsTD As Object
Dim objTD As Object
Dim result As String
Dim intRow As Long
Dim intCol As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate "URL"
ie.Visible = True ' loop until page is loaded
Do Until (ie.ReadyState = 4 And Not ie.Busy)
DoEvents
Loop
'set myHTMLDoc to the main pages IE document
Dim myHTMLDoc As HTMLDocument
Set myHTMLDoc = ie.Document
'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
Dim myHTMLFrame2 As HTMLDocument
Set myHTMLFrame2 = ie.Document.querySelector("[title=queue]").contentDocument.getElementById("oTable")
result = myHTMLFrame2
Set html = CreateObject("htmlfile")
myHTMLFrame2 = result
Set objElementTR = html.getElementsByTagName("tr")
ReDim myarray(0 To objElementTR.Length, 0 To 10)
For Each objTR In objElementTR
intRow = intRow + 1
Set objElementsTD = objTR.getElementsByTagName("td")
For Each objTD In objElementsTD
myarray(intRow, intCol) = objTD.innerText
intCol = intCol + 1
Next objTD
intCol = 0
Next objTR
With Sheets(1).Cells(1, 1).Cells(Rows.Count, "A").End(xlUp).Offset(1, 0)
.Resize(UBound(myarray), UBound(myarray, 2)).Value = myarray
End With
End Sub
答案 0 :(得分:0)
您可以尝试通过标题属性隔离框架,然后通过contentDocument并通过ID获取表格
ie.document.querySelector("[title=queue]").contentDocument.querySelector("#oTable")
然后可以将.querySelector("#oTable")
的末尾与.getElementById("oTable")
互换
然后我将转储表via clipboard的.outerHTML
,以便将表直接粘贴到工作表中。