如何使用getELementsbyTagName修复“对于每个”迭代?

时间:2019-07-19 14:31:49

标签: html excel vba web-scraping msxml

我在VBA / Excel中使用MSXML和WinHTTP。我正在尝试从元素中的所有

标记元素中提取“内部文本”。

该子对象如何遍历特定类中的所有

标记并填充工作表?

谢谢。

我正在尝试将此策略[0]适应此网站[1]

[0] https://codingislove.com/parse-html-in-excel-vba/ [1] https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx

function triple_DES($data, $key){
   $key = hex2bin($key);
   $data = hex2bin($data);
   $crypt = new PHP_Crypt($key, PHP_Crypt::CIPHER_3DES, PHP_Crypt::MODE_ECB);
   $encrypt = $crypt->encrypt($data);

   return strtoupper(bin2hex($encrypt));
}

1 个答案:

答案 0 :(得分:2)

实际上只有一个具有该类名称的元素article-content,因此您正在做一个外部循环,因此除i = 1外没有其他内容。另外,在第一个循环中,您正在更改要循环的变量,这很可能会导致错误。

For Each para In paras
    Set para = para.getElementsByTagName("p")(i)

在上面,para是您的循环变量。

此外,para.getElementsByTagName("p")返回的集合将从0开始。

如果将您索引到getElementsByClassName返回的初始集合中,然后链接到getElementsByTagName,然后将其用作For Each的集合,则代码将如何工作(将索引从1,因为您可以用它写出正确的行;可以使用循环变量para获取当前节点innerText):

Option Explicit
Public Sub TryKeywordSearch()
    Dim http As Object, html As New HTMLDocument
    Dim paras As Object, para As Object, i As Long

    Set http = CreateObject("MSXML2.XMLHTTP")
    http.Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
    http.send
    html.body.innerHTML = http.responseText
    Set paras = html.getElementsByClassName("article-content")(0).getElementsByTagName("p")
    i = 1
    For Each para In paras
        ThisWorkbook.Worksheets("Sheet1").Cells(i, 1).Value = para.innerText
        i = i + 1
    Next
End Sub

相反,您可以使用更快,更易读的IMO css selector combination来获取父类p中的所有article-content标签:

Option Explicit

Public Sub GetParagraphs()
    Dim http As Object, html As HTMLDocument, paragraphs As Object, i As Long
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
        .send
        html.body.innerHTML = .responseText
    End With
    Set paragraphs = html.querySelectorAll(".article-content p")
    For i = 0 To paragraphs.Length - 1
        ThisWorkbook.Worksheets("Sheet1").Cells(i + 1, 1) = paragraphs.item(i).innerText
    Next i
End Sub