VBA提取HTML CDATA

时间:2016-03-19 03:21:00

标签: javascript xml excel excel-vba vba

我正在尝试使用Excel VBA从具有相同html格式的网站列表中复制CData节点之间的URL内容。 HTML示例如下:

<script>
//<![CDATA[
Wistia.iframeInit({"assets":[{"type":"original","slug":"original","display_name":
"Original file","ext":"mp4","size":2,"bitrate":2677,"public":true,
"url":"https://embed-ssl.wistia.com/deliveries/1.bin"},
{"type":"original","slug":"original","display_name":"Original file",
"ext":"mp4","size":1,"bitrate":2677,"public":true,
"url":"https://embed-ssl.wistia.com/deliveries/2.bin"},
//]]>
</script>

我似乎无法单独使用excel VBA提取CDATA信息。每次我使用下面的脚本时,我都会获得空白或&#34; [object HTMLScriptElement]&#34;

Sub test()

Dim ie As Object
Dim html As Object
Dim mylinks As Object
Dim link As Object
Dim lastRow As Integer
Dim myURL As String
Dim erow As Long

Set ie = CreateObject("InternetExplorer.Application")

lastRow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
For i = 2 To lastRow
myURL = Sheet1.Cells(i, "A").Value
ie.navigate myURL
ie.Visible = False

While ie.readyState <> 4
DoEvents
Wend

Set html = ie.document
Set mylinks = html.getElementsByName("script")(1).innerText

For Each link In mylinks
erow = Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = link
Cells(erow, 1).Columns.AutoFit
Next
End Sub

1 个答案:

答案 0 :(得分:0)

根据我的经验,自动化Internet Explorer非常不稳定。所以我会尽可能长时间地使用XMLHTTP。当然,您的HTML标签汤不是XML,因此无法解析。但我们至少可以使用XMLHTTP获取responseText,然后再使用文本方法。

示例:

Sub test()
 sURL = "https://fast.wistia.net/embed/iframe/vud7ff4i6w"

 Dim oXMLHTTP As Object
 Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP")
 oXMLHTTP.Open "GET", sURL, False
 oXMLHTTP.Send

 sResponseText = oXMLHTTP.responseText

 aScriptParts = Split(sResponseText, "<script", , vbTextCompare) 'separate in parts delimited with <script
 For i = LBound(aScriptParts) + 1 To UBound(aScriptParts) 'lbound+1 because the first part should not be script. It is the body html.
  sScriptPart = Split(aScriptParts(i), "</script", , vbTextCompare)(0) 'only the part before </script belongs to the script
  MsgBox sScriptPart
 Next
End Sub

您还可以使用正则表达式而不是Split方法将脚本部分与整个文本分开。但是你应该向RegEx专家提出一个单独的问题。我不是这样的RegEx专家。