VBA:使用<ul和<li和<div和<span进行Web抓取

时间:2019-11-29 11:32:45

标签: html excel vba web-scraping

enter image description here我正在使用VBA从<span<Div<li下的<ul代码中的HTML中提取数据。 / p>

我正在尝试从HTML中提取“日期和事项”。在Excel中,“日期”应该在A列中,而“事项”应该在B列中。

我的代码的缺点是,它将所有Datematter都拉到单个单元格中。

Sub GetDat()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim elem As Object, data As String

    With IE
        .Visible = True
        .navigate "https://www.MyURL/sc/wo/Worders/index?id=76888564"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .document
    End With

    data = ""

    For Each elem In html.getElementsByClassName("simple-list")(0).getElementsByTagName("li")
        data = data & " " & elem.innerText
    Next elem

    Range("A1").Value = data

    IE.Quit
End Sub

我需要的输出显示在图像中:

HTML:

<ul class="simple-list">
    <!-- ko foreach: $root.workOrderNotes-->
    <li>
        <!-- Note # + Date/Time -->
        <div class="wo-notes-col-1">
            <span class="wo-notes-num" data-bind="text: $root.workOrderNotesCount() - $index()">3</span>
            <span class="wo-note-date">
                                <span data-bind="text:createdDate().length > timeWithoutTimeZoneLength ? createdDate() : createdDate() + ' EST'">Dec 13 2016 23:30 CST  </span>
            <span class="wo-note-action-required" style="display: none;" data-bind="visible: actionRequired() == 'Yes'">
                                    <strong>ACTION REQUIRED</strong>
                                </span>
            </span>
            <span class="wo-note-action-required mobile" style="display: none;" data-bind="visible: actionRequired() == 'Yes'">
                                <strong>ACTION REQUIRED</strong>
                            </span>
        </div>
        <!-- Everything Else -->
        <div class="wo-notes-col-2">
            <!-- Created By/FollowUp/Scheduled -->
            <div class="wo-notes-details">
                <div class="wo-notes-details-main">
                    <div class="SC_label">Created By</div>
                    <div class="wo-notes-details-createdBy" data-bind="text: createdBy()">Feedback auto-process</div>
                    <div data-bind="text: companyName()">Rajsha STORES, INC</div>
                    <div class="wo-notes" data-bind="html: note()">At this time the work order has been updated to a billable status. Your company can process an invoice against the work order. For any questions regarding invoicing, refer to the Invoice instructions on the Help Link in ServiceChannel.</div>
                </div>
                <div class="wo-notes-details-dates">
                    <div style="display: none;" data-bind="visible: followUpDate()">
                        <div class="SC_label">Follow Up</div>
                        <div class="wo-notes-details-followUp" data-bind="text: followUpDate()"></div>
                    </div>
                    <div style="display: none;" data-bind="visible: scheduledDate()">
                        <div class="SC_label">Scheduled</div>
                        <div class="wo-notes-details-scheduled" data-bind="text: scheduledDate()"></div>
                    </div>
                </div>
            </div>
            <!-- FollowUp/Scheduled for mobile -->
            <div class="wo-notes-details-dates mobile">
                <div style="display: none;" data-bind="visible: followUpDate()">
                    <span class="SC_label">Follow Up</span>
                    <span class="wo-notes-details-followUp" data-bind="text: followUpDate()"></span>
                </div>
                <div style="display: none;" data-bind="visible: scheduledDate()">
                    <span class="SC_label">Scheduled</span>
                    <span class="wo-notes-details-scheduled" data-bind="text: scheduledDate()"></span>
                </div>
            </div>
            <!-- Mailed To -->
            <div style="display: none;" data-bind="visible: mailedTo()">
                <a class="wo-notes-email" href="#" data-bind="text: mailedTo()"></a>
            </div>
        </div>
    </li>

    <li>
        <!-- Note # + Date/Time -->
        <div class="wo-notes-col-1">
            <span class="wo-notes-num" data-bind="text: $root.workOrderNotesCount() - $index()">2</span>
            <span class="wo-note-date">
                                <span data-bind="text:createdDate().length > timeWithoutTimeZoneLength ? createdDate() : createdDate() + ' EST'">Dec 11 2016 02:15 CST  </span>
            <span class="wo-note-action-required" style="display: none;" data-bind="visible: actionRequired() == 'Yes'">
                                    <strong>ACTION REQUIRED</strong>
                                </span>
            </span>
            <span class="wo-note-action-required mobile" style="display: none;" data-bind="visible: actionRequired() == 'Yes'">
                                <strong>ACTION REQUIRED</strong>
                            </span>
        </div>
        <!-- Everything Else -->
        <div class="wo-notes-col-2">
            <!-- Created By/FollowUp/Scheduled -->
            <div class="wo-notes-details">
                <div class="wo-notes-details-main">
                    <div class="SC_label">Created By</div>
                    <div class="wo-notes-details-createdBy" data-bind="text: createdBy()">Auto-Update Procedure</div>
                    <div data-bind="text: companyName()">Rajsha STORES, INC</div>
                    <div class="wo-notes" data-bind="html: note()">Status changed by auto-update procedure.</div>
                </div>
                <div class="wo-notes-details-dates">
                    <div style="display: none;" data-bind="visible: followUpDate()">
                        <div class="SC_label">Follow Up</div>
                        <div class="wo-notes-details-followUp" data-bind="text: followUpDate()"></div>
                    </div>
                    <div data-bind="visible: scheduledDate()">
                        <div class="SC_label">Scheduled</div>
                        <div class="wo-notes-details-scheduled" data-bind="text: scheduledDate()">Dec 10 2016 23:59 CST </div>
                    </div>
                </div>
            </div>
            <!-- FollowUp/Scheduled for mobile -->
            <div class="wo-notes-details-dates mobile">
                <div style="display: none;" data-bind="visible: followUpDate()">
                    <span class="SC_label">Follow Up</span>
                    <span class="wo-notes-details-followUp" data-bind="text: followUpDate()"></span>
                </div>
                <div data-bind="visible: scheduledDate()">
                    <span class="SC_label">Scheduled</span>
                    <span class="wo-notes-details-scheduled" data-bind="text: scheduledDate()">Dec 10 2016 23:59 CST  </span>
                </div>
            </div>
            <!-- Mailed To -->
            <div style="display: none;" data-bind="visible: mailedTo()">
                <a class="wo-notes-email" href="#" data-bind="text: mailedTo()"></a>
            </div>
        </div>
    </li>

    <li>
        <!-- Note # + Date/Time -->
        <div class="wo-notes-col-1">
            <span class="wo-notes-num" data-bind="text: $root.workOrderNotesCount() - $index()">1</span>
            <span class="wo-note-date">
                                <span data-bind="text:createdDate().length > timeWithoutTimeZoneLength ? createdDate() : createdDate() + ' EST'">Dec 01 2016 01:51 CST  </span>
            <span class="wo-note-action-required" style="display: none;" data-bind="visible: actionRequired() == 'Yes'">
                                    <strong>ACTION REQUIRED</strong>
                                </span>
            </span>
            <span class="wo-note-action-required mobile" style="display: none;" data-bind="visible: actionRequired() == 'Yes'">
                                <strong>ACTION REQUIRED</strong>
                            </span>
        </div>
        <!-- Everything Else -->
        <div class="wo-notes-col-2">
            <!-- Created By/FollowUp/Scheduled -->
            <div class="wo-notes-details">
                <div class="wo-notes-details-main">
                    <div class="SC_label">Created By</div>
                    <div class="wo-notes-details-createdBy" data-bind="text: createdBy()">PM Auto Dispatch</div>
                    <div data-bind="text: companyName()">Rajsha STORES, INC</div>
                    <div class="wo-notes" data-bind="html: note()">Auto Dispatch summary sent to Rajsha@divisionsinc.com on Dec 01, 2016 02:50 EST</div>
                </div>
                <div class="wo-notes-details-dates">
                    <div style="display: none;" data-bind="visible: followUpDate()">
                        <div class="SC_label">Follow Up</div>
                        <div class="wo-notes-details-followUp" data-bind="text: followUpDate()"></div>
                    </div>
                    <div style="display: none;" data-bind="visible: scheduledDate()">
                        <div class="SC_label">Scheduled</div>
                        <div class="wo-notes-details-scheduled" data-bind="text: scheduledDate()"></div>
                    </div>
                </div>
            </div>
            <!-- FollowUp/Scheduled for mobile -->
            <div class="wo-notes-details-dates mobile">
                <div style="display: none;" data-bind="visible: followUpDate()">
                    <span class="SC_label">Follow Up</span>
                    <span class="wo-notes-details-followUp" data-bind="text: followUpDate()"></span>
                </div>
                <div style="display: none;" data-bind="visible: scheduledDate()">
                    <span class="SC_label">Scheduled</span>
                    <span class="wo-notes-details-scheduled" data-bind="text: scheduledDate()"></span>
                </div>
            </div>
            <!-- Mailed To -->
            <div style="display: none;" data-bind="visible: mailedTo()">
                <a class="wo-notes-email" href="#" data-bind="text: mailedTo()"></a>
            </div>
        </div>
    </li>
    <!-- /ko -->
</ul>

1 个答案:

答案 0 :(得分:0)

您可以获取两个nodeList,一个用于日期,另一个用于事务,然后将那些循环写到工作表中。根据{{​​1}}属性值匹配datesdata-bind上的matters

classname

从C列读取值的示例:

Dim dates As Object, matters As Object, i As Long, ws As Worksheet

Set ws = ThisWorkbook.Worksheets("Sheet1")
Set dates = ie.document.querySelectorAll("[data-bind^='text:createdDate']") '.wo-notes-col-1 [data-bind^='text:createdDate']
Set matters = ie.document.querySelectorAll(".wo-notes")

With ws

    For i = 0 To dates.Length - 1
        .Cells(i + 1, 1) = dates.Item(i).innertext
        .Cells(i + 1, 2) = matters.Item(i).innertext
    Next

End With

参考:

  1. document.querySelectorAll
  2. css selectors