我正在尝试将许多旧的.DOC文件转换为PDF格式或RTF格式。到目前为止,我找到了一个完成后者(转换为RTF),但旧Word应用程序的格式仍然存在于文档中。如果您打开Microsoft Word(我使用2010)并单击文件>打开,有一个下拉菜单,允许您选择“从任何文件中恢复文本(。)”。是否可以在转换过程中使用它来过滤掉.DOC文档中的格式数据?以下是我目前正在尝试修改的脚本的几个示例:
虽然它似乎只是将.rtf附加到文件的末尾而不是更改格式,但这个有用了:
Sub SaveAllAsDOCX()
Dim strFilename As String
Dim strDocName As String
Dim strPath As String
Dim oDoc As Document
Dim fDialog As FileDialog
Dim intPos As Integer
Set fDialog = Application.FileDialog(msoFileDialogFolderPicker)
With fDialog
.Title = "Select folder and click OK"
.AllowMultiSelect = False
..InitialView = msoFileDialogViewList
If .Show <> -1 Then
MsgBox "Cancelled By User", , "List Folder Contents"
Exit Sub
End If
strPath = fDialog.SelectedItems.Item(1)
If Right(strPath, 1) <> "\" Then strPath = strPath + "\"
End With
If Documents.Count > 0 Then
Documents.Close SaveChanges:=wdPromptToSaveChanges
End If
If Left(strPath, 1) = Chr(34) Then
strPath = Mid(strPath, 2, Len(strPath) - 2)
End If
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
Set oDoc = Documents.Open(strPath & strFilename)
strDocName = ActiveDocument.FullName
intPos = InStrRev(strDocName, ".")
strDocName = Left(strDocName, intPos - 1)
strDocName = strDocName & ".docx"
oDoc.SaveAs FileName:=strDocName, _
FileFormat:=wdFormatDocumentDefault
oDoc.Close SaveChanges:=wdDoNotSaveChanges
strFilename = Dir$()
Wend
End Sub
到目前为止,在任何转换中都没有成功:
Option Explicit
Sub ChangeDocsToTxtOrRTFOrHTML()
'with export to PDF in Word 2007
Dim fs As Object
Dim oFolder As Object
Dim tFolder As Object
Dim oFile As Object
Dim strDocName As String
Dim intPos As Integer
Dim locFolder As String
Dim fileType As String
On Error Resume Next
locFolder = InputBox("Enter the folder path to DOCs", "File Conversion", "C:\myDocs")
Select Case Application.Version
Case Is < 12
Do
fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML", "File Conversion", "TXT"))
Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML")
Case Is >= 12
Do
fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML or PDF(2007+ only)", "File Conversion", "TXT"))
Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML" Or fileType = "PDF")
End Select
Application.ScreenUpdating = False
Set fs = CreateObject("Scripting.FileSystemObject")
Set oFolder = fs.GetFolder(locFolder)
Set tFolder = fs.CreateFolder(locFolder & "Converted")
Set tFolder = fs.GetFolder(locFolder & "Converted")
For Each oFile In oFolder.Files
Dim d As Document
Set d = Application.Documents.Open(oFile.Path)
strDocName = ActiveDocument.Name
intPos = InStrRev(strDocName, ".")
strDocName = Left(strDocName, intPos - 1)
ChangeFileOpenDirectory tFolder
Select Case fileType
Case Is = "TXT"
strDocName = strDocName & ".txt"
ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatText
Case Is = "RTF"
strDocName = strDocName & ".rtf"
ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatRTF
Case Is = "HTML"
strDocName = strDocName & ".html"
ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatFilteredHTML
Case Is = "PDF"
strDocName = strDocName & ".pdf"
' *** Word 2007 users - remove the apostrophe at the start of the next line ***
'ActiveDocument.ExportAsFixedFormat OutputFileName:=strDocName, ExportFormat:=wdExportFormatPDF
End Select
d.Close
ChangeFileOpenDirectory oFolder
Next oFile
Application.ScreenUpdating = True
End Sub
答案 0 :(得分:1)
我将使用VBA脚本覆盖一种方式来执行您想要的操作,而无需使用Word内置的“从任何文件中恢复文本”模式功能。
它将一个目录中的每个.doc / .docx转换为.txt,但可以用于转换为父应用程序支持的任何其他格式(我使用Word 2010测试过)。如下:
'------------ VBA script start -------------
Sub one1()
Set fs = CreateObject("Scripting.FileSystemObject")
Set list1 = fs.GetFolder(ActiveDocument.Path)
For Each fl In list1.files
If InStr(fl.Type, "Word") >= 1 And Not fl.Path = ActiveDocument.Path & "\" & ActiveDocument.Name Then
Set wordapp = CreateObject("word.Application")
Set Doc1 = wordapp.Documents.Open(fl.Path)
'wordapp.Visible = True
Doc1.SaveAs2 FileName:=fl.Name & ".txt", fileformat:=wdFormatText
wordapp.Quit
End If
Next
End Sub
'------------ VBA script start -------------
要另存为PDF,请使用
Doc1.SaveAs2 FileName:=fl.Name & ".pdf", fileformat:=wdFormatPDF
代替
保存为RTF,请使用
Doc1.SaveAs2 FileName:=fl.Name & ".rtf", fileformat:=wdFormatRTF
代替
或者说,HTML:
Doc1.SaveAs2 FileName:=fl.Name & ".html", fileformat:=wdFormatHTML
等等。
我没有费心检查的一些缺点,因为它们是无害的:
在执行结束时会弹出一条错误消息,但不会产生任何后果。
它试图打开自己,因为它是文档本身内部的VBA脚本,而且它是一个文档开启脚本。然后,当弹出一条消息时,你必须指示“他”以只读方式打开它。
它会将所有文档保存到C:\ users \ username \ Documents中,而不是从中执行的文档,在大多数情况下会更好。
过程缓慢,预计大多数普通个人电脑的速度为2-3档/秒。