我正在尝试从位于C:\ Sample.html的本地html文件中提取两个部分 我从类似的另一个线程中使用了@QHarr代码
Sub Test()
Dim html As HTMLDocument, post As Object, i As Long
Set html = GetHTMLFileContent("C:\Sample.html")
Set post = html.querySelectorAll("span.course-player__chapter-item__completion")
For i = 0 To post.Length - 1
ActiveSheet.Cells(i + 1, 1) = Trim(post.item(i).innerText)
ActiveSheet.Cells(i + 1, 2) = post.item(i).PreviousSibling.innerText
Next i
End Sub
Function GetHTMLFileContent(ByVal filePath As String) As HTMLDocument
Dim fso As Object, hFile As Object, hString As String, html As HTMLDocument
Set html = New HTMLDocument
Set fso = CreateObject("Scripting.FileSystemObject")
Set hFile = fso.OpenTextFile(filePath)
Do Until hFile.AtEndOfStream
hString = hFile.ReadAll()
Loop
html.body.innerHTML = hString
Set GetHTMLFileContent = html
End Function
代码可以正常工作,并可以获取post.item(i).innerText
部分中元素的内部文本。
但是,当尝试获取上一个兄弟姐妹的内文时,它不会返回任何内容
这是html的快照
<div class="course-player__chapter-item__header _chapter-item__header_d57kmg ui-accordion-header ui-corner-top ui-state-default ui-accordion-icons ui-accordion-header-active ui-state-active" role="tab" id="ui-id-1" aria-controls="ui-id-2" aria-selected="true" aria-expanded="true" tabindex="0"><span class="ui-accordion-header-icon ui-icon ui-icon-triangle-1-s"></span>
<h2 tabindex="-1" class="course-player__chapter-item__title _chapter-item__title_d57kmg">
<span class="course-player__progress _chapter-item__progress_d57kmg">
<span data-percentage-completion="100" class="_chapter-item__progress-ring_d57kmg">
<span class="progress-ring__ring _progress-ring__ring_jgsecr">
<span class="progress-ring__mask progress-ring--full _progress-ring__mask_jgsecr _progress-ring--full_jgsecr">
<span class="progress-ring--fill brand-color__background _progress-ring--fill_jgsecr"></span>
</span>
<span class="progress-ring__mask progress-ring--half _progress-ring__mask_jgsecr ">
<span class="progress-ring--fill brand-color__background _progress-ring--fill_jgsecr"></span>
<span class="progress-ring--fill progress-ring--fix _progress-ring--fill_jgsecr _progress-ring--fix_jgsecr"></span>
</span>
</span>
<span class="progress-ring__ring-inset _progress-ring__ring-inset_jgsecr"></span>
<span class="progress-ring__checkmark brand-color__text _progress-ring__checkmark_jgsecr"><i aria-label="Completed" class="toga-icon toga-icon-checkmark"></i></span>
</span>
</span>
INTRO TO VBA - Overview
<!---->
<span class="course-player__chapter-item__completion _chapter-item__completion_d57kmg">
10 / 10
</span>
<span class="course-player__chapter-item__toggle _chapter-item__toggle_d57kmg">
<i aria-hidden="true" class="chapter-item__toggle-icon toga-icon toga-icon-caret-stroke-down _chapter-item__toggle-icon_d57kmg"></i>
</span>
</h2>
</div>
答案 0 :(得分:0)
我使用了CSS选择器,该选择器使用h2[class='course-player__chapter-item__title _chapter-item__title_d57kmg']
返回所有值,然后将输出分为两列
Sub Test()
Dim x, html As HTMLDocument, post As Object, s As String, i As Long
Set html = GetHTMLFileContent("C:\Sample.html")
Set post = html.querySelectorAll("h2[class='course-player__chapter-item__title _chapter-item__title_d57kmg']")
For i = 0 To post.Length - 1
x = Split(Trim(post.item(i).innerText), " ")
s = Join(Array(x(UBound(x)), x(UBound(x) - 1), x(UBound(x) - 2)), " ")
ReDim Preserve x(0 To UBound(x) - 3)
ActiveSheet.Cells(i + 1, 1) = Trim(Join(x, " "))
ActiveSheet.Cells(i + 1, 2) = Trim(s)
Next i
End Sub