使用VBA将多个Word文档转换为HTML文件

时间:2019-06-14 06:03:07

标签: vba ms-word

如何使用VBA将某个文件夹中的多个MS Word文档转换为HTML? 我已经使用Powershell来执行此操作,但是不幸的是,访问被阻止运行脚本。

2 个答案:

答案 0 :(得分:0)

也许这可以帮助?!!

"compilerOptions": {
    "noImplicitAny": false,
    "noEmitOnError": false,
    "removeComments": false,
    "inlineSourceMap": true,
    "target": "es5",
    "lib": [ "dom", "es2015", "es2017", "es2015.iterable", "scripthost" ],
    "module": "es6",
    "moduleResolution": "node",
    "experimentalDecorators": true,
    "emitDecoratorMetadata": true,
    "declaration": true,
    "allowSyntheticDefaultImports": true,
    "baseUrl": "./",
    "outDir": "../../Out/AnyCPU/Debug/",
}
############################################## ##############
Option Explicit

Sub ChangeDocsToTxtOrRTFOrHTML()
'with export to PDF in Word 2007
    Dim fs As Object
    Dim oFolder As Object
    Dim tFolder As Object
    Dim oFile As Object
    Dim strDocName As String
    Dim intPos As Integer
    Dim locFolder As String
    Dim fileType As String
    On Error Resume Next
    locFolder = InputBox("Enter the folder path to DOCs", "File Conversion", "C:\myDocs")
    Select Case Application.Version
        Case Is < 12
            Do
                fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML", "File Conversion", "TXT"))
            Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML")
        Case Is >= 12
            Do
                fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML or PDF(2007+ only)", "File Conversion", "TXT"))
            Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML" Or fileType = "PDF")
    End Select
    Application.ScreenUpdating = False
    Set fs = CreateObject("Scripting.FileSystemObject")
    Set oFolder = fs.GetFolder(locFolder)
    Set tFolder = fs.CreateFolder(locFolder & "Converted")
    Set tFolder = fs.GetFolder(locFolder & "Converted")
    For Each oFile In oFolder.Files
        Dim d As Document
        Set d = Application.Documents.Open(oFile.Path)
        strDocName = ActiveDocument.Name
        intPos = InStrRev(strDocName, ".")
        strDocName = Left(strDocName, intPos - 1)
        ChangeFileOpenDirectory tFolder
        Select Case fileType
        Case Is = "TXT"
            strDocName = strDocName & ".txt"
            ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatText
        Case Is = "RTF"
            strDocName = strDocName & ".rtf"
            ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatRTF
        Case Is = "HTML"
            strDocName = strDocName & ".html"
            ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatFilteredHTML
        Case Is = "PDF"
            strDocName = strDocName & ".pdf"

            ' *** Word 2007 users - remove the apostrophe at the start of the next line ***
            'ActiveDocument.ExportAsFixedFormat OutputFileName:=strDocName, ExportFormat:=wdExportFormatPDF

        End Select
        d.Close
        ChangeFileOpenDirectory oFolder
    Next oFile
    Application.ScreenUpdating = True
End Sub
############################################## ##############

另外,看到这个...

https://www.youtube.com/watch?v=4vFQV6RtYMM

答案 1 :(得分:0)

我编写了一个 Word VBA 程序来执行此操作。该程序是开源的,在 MIT 许可下。 该程序位于 Github 上的 Word 文档中:

https://github.com/jimyuill/word-web-nav/blob/main/tools/generate_word_html.docm

Word-doc 的文本描述了如何使用该程序。 代码被注释以解释它是如何工作的(alt+F11 打开 IDE)。

程序可选择在文档的开头添加目录。对于 OP,这部分代码可以忽略。

正如 @Cindy Meister 所提到的,OP“太宽泛了”。该程序太大,无法在此处发布所有代码。总结:

提示输入 Word 文档:

有代码提示用户从特定目录转换 Word 文档。大部分代码改编自此处的示例程序:

https://docs.microsoft.com/en-us/office/vba/api/office.filedialog.initialview

两个 API 之一用于获取 Word 文档列表。每个 API 都提供一个用于文件系统浏览的 GUI。

一个 API 允许用户选择目录中的特定文件。 API为:Application.FileDialog(msoFileDialogFilePicker)

https://docs.microsoft.com/en-us/office/vba/api/office.filedialog

另一个 API 允许用户只选择一个目录。里面的所有 Word 文档都被转换了。 API 是:Application.FileDialog(msoFileDialogFolderPicker)

https://docs.microsoft.com/en-us/office/vba/api/office.msofiledialogtype

程序检查是否有两个输入的 Word 文档具有相同的根名称,以及扩展名 .doc、.docx 或 .docm,例如“foo.doc”和“foo.docx”。这对于输入 Word-doc 是不允许的,因为对于每个 Word-doc,都会使用 Word-doc 的根名称创建一个 Word HTML 文件:
.html

以 HTML 格式保存每个 Word-doc:

在所选目录中,程序会创建一个子目录,用于保存 Word-docs 的 HTML 文件和目录。

程序循环处理每个输入的 Word-doc。下面是每个文档的代码重点(省略了一些代码)。该代码基于 @ASH

的回答
    ' Open the Word doc
    Set wordDocObj = Application.Documents.Open(fileObj.Path)

    ' Change the current directory to the output-directory
    ChangeFileOpenDirectory outputFolderObj        

    ' Save the Word-doc in the format "filtered" HTML
    fileBaseName = fileSystemObj.GetBaseName(fileName)
    outputFileName = fileBaseName & ".html"
    ActiveDocument.SaveAs fileName:=outputFileName, FileFormat:=wdFormatFilteredHTML

    ' Close the Word doc
    wordDocObj.Close

    ' Change the current directory to the input-directory
    ChangeFileOpenDirectory sourceFolderObj