我的旧网站上有5109个html文件
我想只从<title>Title 1</title>
中提取文字
和<span class="mtr_message"> Text exemple 1</span>
和导出结果在csv文件中,如下所示:
第一个单元格中的标题1和第二个单元格中的文本示例1
答案 0 :(得分:0)
尝试下面的WSH VBS ode。粘贴路径,将其另存为.vbs文件并运行。
Option Explicit
Dim sSourceFolder, sResultFile, sRes, oFile, sCont
sSourceFolder = "C:\Users\DELL\Desktop\tmp" ' source files folder path
sResultFile = "C:\Users\DELL\Desktop\tmp\result.csv" ' result csv file path
sRes = ""
With CreateObject("Scripting.FileSystemObject")
For Each oFile In .GetFolder(sSourceFolder).Files
If LCase(.GetExtensionName(oFile.Name)) = "htm" And oFile.Size > 0 Then
With .OpenTextFile(oFile.Path, 1, False, -2)
If .AtEndOfStream Then sCont = "" Else sCont = .ReadAll
.Close
End With
With CreateObject("VBScript.RegExp")
.Global = True
.IgnoreCase = True
.Multiline = True
.Pattern = "<title>(.*?)</title>[\s\S]*?<span class=""mtr_message"">(.*?)</span>"
With .Execute(sCont)
If .Count = 1 Then sRes = sRes & """" & .Item(0).SubMatches(0) & """, """ & .Item(0).SubMatches(1) & """" & vbCrlf
End With
End With
End If
Next
With .OpenTextFile(sResultFile, 2, True, 0)
.Write sRes
.Close
End With
End With
MsgBox "Completed"
您可能需要更改代码中的文件扩展名和编码设置。目前处理具有htm
扩展名的文件,并使用默认编码.OpenTextFile(oFile.Path, 1, False, -2)
(Unicode - -2
,ASCII - -1
)读取0
。