<frameset>和<frame>

时间:2019-07-02 23:23:50

标签: html excel vba web-scraping

我正在尝试将网站中的一些表格项粘贴到Excel中。

尽管我在Excel方面对VBA还是很陌生,但我通常对编码并不陌生:)

我尝试使用Excel的数据>从Web界面,它无法识别表格。我猜这是因为它是使用(或至少是我的Google-Fu使我理解的)构建的。

Snipping of what the second table looks like

<html>

<frame title="links" ...>...</frame>

<frame title="queue">
#document

<head>...</head>
<body>
<div id="container>
<script>...</script>
<div>

<table id="oTable">

<colgroup>...</colgroup>

<thead>...</thead>


<tbody>
  <tr onclick="changeHighlight( 'eid0' )" id="eid0" class="queryshaded">
    <td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.5599976.5599976');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a>&nbsp;<a onclick="javascript:window.open('URL','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">12345</a></td>
    <td nowrap=""><a href="`" target="_Blank">28/08/2018 17:00:49</a></td>
    <td nowrap=""><a href="URL" target="_Blank">11/09/2018 16:28:39</a></td>
    <td nowrap=""><a href="URL" target="_Blank">5,599,976</a></td>
    <td nowrap=""><a href="URL" target="_Blank">dijm</a></td></tr>
  <tr onclick="changeHighlight( 'eid1' )" id="eid1" class="queryunshaded">
    <td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443276.6443276');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a>&nbsp;<a onclick="javascript:window.open('URL;id=3.6443276.6443276','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">67890</a></td>
    <td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:01</a></td>
    <td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:32:32</a></td>
    <td nowrap=""><a href="URL" target="_Blank">6,443,276</a></td>
    <td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
  <tr onclick="changeHighlight( 'eid2' )" id="eid2" class="queryshaded">
    <td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443287.6443287');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a>&nbsp;<a onclick="javascript:window.open('URL;id=3.6443287.6443287','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">23456</a></td>
    <td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:24</a></td>
    <td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:35:30</a></td>
    <td nowrap=""><a href="URL" target="_Blank">6,443,287</a></td>
    <td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
  <tr onclick="changeHighlight( 'eid3' )" id="eid3" class="queryunshaded">
    <td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443339.6443339');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a>&nbsp;<a onclick="javascript:window.open('URL;id=3.6443339.6443339','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">78901</a></td>
    <td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:06:02</a></td>
    <td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:40:39</a></td>
    <td nowrap=""><a href="URL" target="_Blank">6,443,339</a></td>
    <td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
  <tr onclick="changeHighlight( 'eid4' )" id="eid4" class="queryshaded">
    <td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443344.6443344');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a>&nbsp;<a onclick="javascript:window.open('URL;id=3.6443344.6443344','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">34567</a></td>
    <td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:06:17</a></td>
    <td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:40:43</a></td>
    <td nowrap=""><a href="URL" target="_Blank">6,443,344</a></td>
    <td nowrap=""><a href="URL" target="_Blank"></a></td></tr>

我尝试了各种看起来像这样的解决方案: https://www.ozgrid.com/forum/forum/other-software-applications/excel-and-web-browsers-help/131683-extracting-data-from-a-grid-on-webpageScraping data from website using vba

并尝试定义框架本身以尝试从中获取信息? (再次:是Excel VBA的新功能)

    'set myHTMLDoc to the main pages IE document
    Dim myHTMLDoc As HTMLDocument
    Set myHTMLDoc = ie.Document

    'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
    Dim myHTMLFrame2 As HTMLDocument
    Set myHTMLFrame2 = myHTMLDoc.Frames(1).Document

使用上面的代码块,我得到了“运行时错误'438' 没有上面的块,我会收到“运行时错误'1004'

我最终想要的信息在每一行中:

    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">67890</a></td>
    <td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:01</a></td>
    <td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:32:32</a></td>
    <td nowrap=""><a href="URL" target="_Blank">6,443,276</a></td>

理想情况下,我想将每个元素转储到单元格中

67890 | 25/06/2019 11:01:01 | 09/09/2019 10:32:32 | 6,443,276

每页上有20行(有一个按钮可以按下以转到下一页,稍后我会弄清楚...希望是哈哈)

大量提示,感谢任何可以帮助您的人:)

-编辑- 这是我目前正在使用的代码(对此并不珍贵:P)

Private Sub CommandButton1_Click()


    Dim ie     As Object
    Dim html   As Object
    Dim objElementTR As Object
    Dim objTR  As Object
    Dim objElementsTD As Object
    Dim objTD  As Object
    Dim result As String
    Dim intRow As Long
    Dim intCol As Long

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Navigate "URL"
    ie.Visible = True     ' loop until page is loaded
    Do Until (ie.ReadyState = 4 And Not ie.Busy)
        DoEvents
    Loop

    'set myHTMLDoc to the main pages IE document
    Dim myHTMLDoc As HTMLDocument
    Set myHTMLDoc = ie.Document

    'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
    Dim myHTMLFrame2 As HTMLDocument
    Set myHTMLFrame2 = ie.Document.querySelector("[title=queue]").contentDocument.getElementById("oTable")

    result = myHTMLFrame2
    Set html = CreateObject("htmlfile")
    myHTMLFrame2 = result
    Set objElementTR = html.getElementsByTagName("tr")
    ReDim myarray(0 To objElementTR.Length, 0 To 10)
    For Each objTR In objElementTR
        intRow = intRow + 1
        Set objElementsTD = objTR.getElementsByTagName("td")
        For Each objTD In objElementsTD
            myarray(intRow, intCol) = objTD.innerText
            intCol = intCol + 1
        Next objTD
        intCol = 0
    Next objTR
    With Sheets(1).Cells(1, 1).Cells(Rows.Count, "A").End(xlUp).Offset(1, 0)
        .Resize(UBound(myarray), UBound(myarray, 2)).Value = myarray
    End With



End Sub

1 个答案:

答案 0 :(得分:0)

您可以尝试通过标题属性隔离框架,然后通过contentDocument并通过ID获取表格

ie.document.querySelector("[title=queue]").contentDocument.querySelector("#oTable")

然后可以将.querySelector("#oTable")的末尾与.getElementById("oTable")互换

然后我将转储表via clipboard.outerHTML,以便将表直接粘贴到工作表中。