VBA Web Scrape(getelementsbyclassname)

时间:2017-01-25 12:02:08

标签: excel vba excel-vba web-scraping

我正在尝试搜索以下链接右侧窗格中提供的VBA课程项目列表" www.tutorialspoint.com/vba/index.htm"

但由于某些错误,我无法删除该列表:

Sub tutorailpointsscrap()
      Dim ie As InternetExplorer

      Set ie = New InternetExplorer

      With ie
      .navigate "https://www.tutorialspoint.com//vba/index.htm"
      .Visible = True
      Do While ie.readyState <> READYSTATE_COMPLETE
      DoEvents
      Loop
      End With

      Dim html As HTMLDocument
      Set html = ie.document


      Dim ele As IHTMLElement

      Dim lists As IHTMLElementCollection
      Dim row As Long

      Set ele = html.getElementsByClassName("nav nav-list primary left-menu")

      Set lists = ele.getElementsByTagName("a")
      row = 1


      For Each li In lists
      Cells(row, 1) = li.innerText
      row = row + 1
      Next

      ie.Quit

  End Sub

包含数据的HTML是:

<ul class="nav nav-list primary left-menu">
<li class="heading">VBA Tutorial</li>
<li><a href="/vba/index.htm" style="background-color: rgb(214, 214, 214);">VBA - Home</a></li>
<li><a href="/vba/vba_overview.htm">VBA - Overview</a></li>
<li><a href="/vba/vba_excel_macros.htm">VBA - Excel Macros</a></li>
<li><a href="/vba/vba_excel_terms.htm">VBA - Excel Terms</a></li>
<li><a href="/vba/vba_macro_comments.htm">VBA - Macro Comments</a></li>
<li><a href="/vba/vba_message_box.htm">VBA - Message Box</a></li>
<li><a href="/vba/vba_input_box.htm">VBA - Input Box</a></li>
<li><a href="/vba/vba_variables.htm">VBA - Variables</a></li>
<li><a href="/vba/vba_constants.htm">VBA - Constants</a></li>
<li><a href="/vba/vba_operators.htm">VBA - Operators</a></li>
<li><a href="/vba/vba_decisions.htm">VBA - Decisions</a></li>
<li><a href="/vba/vba_loops.htm">VBA - Loops</a></li>
<li><a href="/vba/vba_strings.htm">VBA - Strings</a></li>
<li><a href="/vba/vba_date_time.htm">VBA - Date and Time</a></li>
<li><a href="/vba/vba_arrays.htm">VBA - Arrays</a></li>
<li><a href="/vba/vba_functions.htm">VBA - Functions</a></li>
<li><a href="/vba/vba_sub_procedure.htm">VBA - SubProcedure</a></li>
<li><a href="/vba/vba_events.htm">VBA - Events</a></li>
<li><a href="/vba/vba_error_handling.htm">VBA - Error Handling</a></li>
<li><a href="/vba/vba_excel_objects.htm">VBA - Excel Objects</a></li>
<li><a href="/vba/vba_text_files.htm">VBA - Text Files</a></li>
<li><a href="/vba/vba_programming_charts.htm">VBA - Programming Charts</a></li>
<li><a href="/vba/vba_userforms.htm">VBA - Userforms</a></li>
</ul>

4 个答案:

答案 0 :(得分:1)

如果我正确地解决了您的问题,您需要以下内容。 HTH

Dim lists As IHTMLElementCollection
Dim anchorElements As IHTMLElementCollection
Dim ulElement As HTMLUListElement
Dim liElement As HTMLLIElement
Dim row As Long

Set lists = html.getElementsByClassName("nav nav-list primary left-menu")
row = 1

For Each ulElement In lists
    For Each liElement In ulElement.getElementsByTagName("li")
        Set anchorElements = liElement.getElementsByTagName("a")
        If anchorElements.Length > 0 Then
            Cells(row, 1) = anchorElements.Item(0).innerText
            row = row + 1
        End If
    Next liElement
Next ulElement
  

此结果(适用于所有列表)

VBA - Home
VBA - Overview
VBA - Excel Macros
VBA - Excel Terms
VBA - Macro Comments
VBA - Message Box
VBA - Input Box
VBA - Variables
VBA - Constants
VBA - Operators
VBA - Decisions
VBA - Loops
VBA - Strings
VBA - Date and Time
VBA - Arrays
VBA - Functions
VBA - SubProcedure
VBA - Events
VBA - Error Handling
VBA - Excel Objects
VBA - Text Files
VBA - Programming Charts
VBA - Userforms
VBA - Quick Guide
VBA - Useful Resources
VBA - Discussion
Developer's Best Practices
Questions and Answers
Effective Resume Writing
HR Interview Questions
Computer Glossary
Who is Who

如果您想要第一个列表的锚点内容,那么就像这样。

For Each liElement In lists.Item(0).getElementsByTagName("li")
    Set anchorElements = liElement.getElementsByTagName("a")
    If anchorElements.Length > 0 Then
        Cells(row, 1) = anchorElements.Item(0).innerText
        row = row + 1
    End If
Next liElement

答案 1 :(得分:0)

这个怎么样:

Sub TutorailsPoint()
Const URL = "https://www.tutorialspoint.com//vba/index.htm"
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, posts As Object, topic As Object
Dim x As Long

x = 2

http.Open "GET", URL, False
http.send
html.body.innerHTML = http.responseText

Set topics = html.getElementsByClassName("nav nav-list primary left-menu")
For Each posts In topics
    For Each topic In posts.getElementsByTagName("a")
        Cells(x, 1) = topic.innerText
        x = x + 1
    Next topic
Next posts
End Sub

答案 2 :(得分:0)

这是您可能喜欢的另一种方式。它只会提供教程而不是其他内容:

Sub TpData()
    Const URL = "https://www.tutorialspoint.com//vba/index.htm"
    Dim http As New XMLHTTP60, html As New HTMLDocument, post As Object

    With http
        .Open "GET", URL, False
        .send
        html.body.innerHTML = .responseText
    End With

    For Each post In html.getElementsByClassName("left-menu")(1).getElementsByTagName("li")
        With post.getElementsByTagName("a")
            If .Length Then i = i + 1: Cells(i, 1) = .item(0).innerText
        End With
    Next post
End Sub

答案 3 :(得分:0)

子Button1_Click()

Dim internet As Object
Dim URL As String

Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True

For i = 2 To 3

URL = Sheets("Sheet2").Range("A" & i).Value
            internet.Navigate URL

 Application.Wait Now + TimeSerial(0, 0, 15)

 Do Until internet.ReadyState >= 4
    DoEvents
Loop


 Set a = internet.document
         Set lists = a.GetElementsByClassName("mg-results-td is-sv uk-flex uk-flex-middle")(0)
         'Range("B" & i).Value = e.NextSibling.innerText
         'Range("B" & i).Value = "Sajan"
         'For Each ulElement In lists
             Range("B" & i).Value = lists.innerText
         'Next ulElement

'internet.GoBack
Application.Wait Now + TimeSerial(0, 0, 50)

下一个我 结束