我正在尝试使用Excel VBA从具有相同html格式的网站列表中复制CData节点之间的URL内容。 HTML示例如下:
<script>
//<![CDATA[
Wistia.iframeInit({"assets":[{"type":"original","slug":"original","display_name":
"Original file","ext":"mp4","size":2,"bitrate":2677,"public":true,
"url":"https://embed-ssl.wistia.com/deliveries/1.bin"},
{"type":"original","slug":"original","display_name":"Original file",
"ext":"mp4","size":1,"bitrate":2677,"public":true,
"url":"https://embed-ssl.wistia.com/deliveries/2.bin"},
//]]>
</script>
我似乎无法单独使用excel VBA提取CDATA信息。每次我使用下面的脚本时,我都会获得空白或&#34; [object HTMLScriptElement]&#34;
Sub test()
Dim ie As Object
Dim html As Object
Dim mylinks As Object
Dim link As Object
Dim lastRow As Integer
Dim myURL As String
Dim erow As Long
Set ie = CreateObject("InternetExplorer.Application")
lastRow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
For i = 2 To lastRow
myURL = Sheet1.Cells(i, "A").Value
ie.navigate myURL
ie.Visible = False
While ie.readyState <> 4
DoEvents
Wend
Set html = ie.document
Set mylinks = html.getElementsByName("script")(1).innerText
For Each link In mylinks
erow = Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = link
Cells(erow, 1).Columns.AutoFit
Next
End Sub
答案 0 :(得分:0)
根据我的经验,自动化Internet Explorer非常不稳定。所以我会尽可能长时间地使用XMLHTTP。当然,您的HTML标签汤不是XML,因此无法解析。但我们至少可以使用XMLHTTP获取responseText,然后再使用文本方法。
示例:
Sub test()
sURL = "https://fast.wistia.net/embed/iframe/vud7ff4i6w"
Dim oXMLHTTP As Object
Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.Send
sResponseText = oXMLHTTP.responseText
aScriptParts = Split(sResponseText, "<script", , vbTextCompare) 'separate in parts delimited with <script
For i = LBound(aScriptParts) + 1 To UBound(aScriptParts) 'lbound+1 because the first part should not be script. It is the body html.
sScriptPart = Split(aScriptParts(i), "</script", , vbTextCompare)(0) 'only the part before </script belongs to the script
MsgBox sScriptPart
Next
End Sub
您还可以使用正则表达式而不是Split
方法将脚本部分与整个文本分开。但是你应该向RegEx
专家提出一个单独的问题。我不是这样的RegEx
专家。