无论如何以编程方式优化pdf?

时间:2014-04-23 23:42:26

标签: python-2.7 pdf word-vba

我想以编程方式优化(即“在Acrobat Pro 10中另存为缩小尺寸PDF”)一系列PDF文件。如果可能的话,我宁愿从python 2.7.5做到这一点,如果不是从python那么可能是VBA Word,我最后的偏好是从另一个编程机制做到这一点。

想法?

4 个答案:

答案 0 :(得分:1)

建议是查看pdfsizeopt

Python程序旨在充当PDF文件大小优化器。它可用于将较大的pdf转换为较小的pdf,并支持您可以调用的命令行界面。

详细说明:

  

pdfsizeopt是一个将大型PDF文件转换为小型文件的程序。   更具体地说,pdfsizeopt是一个免费的跨平台命令行   应用程序(适用于Linux,Mac OS X,Windows和Unix)和集合   优化PDF文件大小的最佳实践,重点关注   从TeX和LaTeX文档创建的PDF。 pdfsizeopt是用。写的   Python,所以它有点慢,但它卸载了一些繁重的工作   它的更快(C,C ++和Java)依赖。 pdfsizeopt是开发的   一个Linux系统,它依赖于现有的工具,如Python 2.4,   Ghostscript 8.50,jbig2enc(可选),sam2p,pngtopnm,pngout   (可选)和写入的多价PDF压缩器(可选)   Java的。

参考:

http://code.google.com/p/pdfsizeopt/

答案 1 :(得分:0)

另一个选项可以是Aspose.PDF Cloud SDK for Python。它是付费的 REST API,但每月提供 150 次免费 API 调用。目前,它从云存储(Aspose 默认存储/Amazon S3/Google Drive/Azure 存储/Dropbox/FTP 存储)压缩PDF 文档。在不久的将来,我们计划支持从请求正文(流)压缩 PDF。

import os
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi
from shutil import copyfile

# Get App key and App SID from https://cloud.aspose.com
pdf_api_client = asposepdfcloud.api_client.ApiClient(
    app_key='xxxxxxxxxxxxxxxxxxxxxxxxxx',
    app_sid='xxxxx-xxxx-xxxx-xxxx-xxxxxxxx')

pdf_api = PdfApi(pdf_api_client)
temp_folder="Temp"

#upload PDF file to storage

data_file = "C:/Temp/02_pages.pdf"
remote_name="02_pages.pdf"
result_name="02_pages_compressed.pdf"

pdf_api.upload_file(temp_folder + '/' + remote_name,data_file)

optimize_options = asposepdfcloud.models.OptimizeOptions(
                allow_reuse_page_content=False,
                compress_images=True,
                image_quality=100,
                link_duplcate_streams=True,
                remove_unused_objects=True,
                remove_unused_streams=True,            
                unembed_fonts=True)
opts = {
            "options" : optimize_options,
            "folder" : temp_folder
        }

response = pdf_api.post_optimize_document(remote_name, **opts)

#download PDF file from storage
response_download = pdf_api.download_file(temp_folder + '/' + remote_name)
copyfile(response_download, 'C:/Temp/' + result_name)
print(response)

P.S:我是 Aspose 的开发人员布道师。

答案 2 :(得分:0)

我正在使用 Ghostscript 批量处理 pdf。此 VBA 可在 Word 和 Excel 中使用。它要求提供源目录和目标目录。 .bat 文件已创建并存储在 Source 文件夹中,然后您就可以执行它了。我可能会使这个脚本更加健壮,并会在我这样做时在此处更新。

Ghostscript

Sub gsPDF_Bat()
  
    'Summary of -dPDFSETTINGS:

    '-dPDFSETTINGS=/screen lower quality, smaller size. (72 dpi)
    '-dPDFSETTINGS=/ebook for better quality, but slightly larger pdfs. (150 dpi)
    '-dPDFSETTINGS=/prepress output similar to Acrobat Distiller "Prepress Optimized" setting (300 dpi)
    '-dPDFSETTINGS=/printer selects output similar to the Acrobat Distiller "Print Optimized" setting (300 dpi)
    '-dPDFSETTINGS=/default selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file
      
    Dim ProofsFolder As String
    Dim CompressFolder As String
    Dim exePath As String

    exePath = "C:\Program Files\gs\gs9.54.0\bin\"

    ' Open the select folder prompt
    With Application.FileDialog(msoFileDialogFolderPicker)
        If .Show = -1 Then ' if OK is pressed
            ProofsFolder = .SelectedItems(1)
        End If
    End With
    
    With Application.FileDialog(msoFileDialogFolderPicker)
        If .Show = -1 Then ' if OK is pressed
            CompressFolder = .SelectedItems(1)
        End If
    End With
        
    Dim fso As Object
    Dim folder As Object
    Dim CurrFile As Object

  
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set folder = fso.GetFolder(ProofsFolder)
       
    Open ProofsFolder & "\gsPDF-Compress.bat" For Output As #1
       
    For Each CurrFile In folder.Files
        FName = CurrFile.Name
        CurrFileExt = Right(FName, 4)
            Debug.Print CurrFileExt

            If CurrFileExt = ".pdf" Then

                backNum = InStrRev(CurrFile, "\", -1)
                FName = Mid(CurrFile, (backNum + 1))

                Print #1, exePath & "gswin64 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dAutoRotatePages=/None -r300 -dUseCIEColor -sOutputFile=""" & CompressFolder & "\" & FName & """ """ & CurrFile & """"
            End If
    Next
    Close #1

    Set fso = Nothing
    Set folder = Nothing
End Sub

答案 3 :(得分:0)

和我之前的回答一样,仍然使用 Ghostscript。我注意到当我们选择 1,000 个左右的 pdf 进行批量优化时,Excel 需要几分钟才能完成 bat 文件。我写了一个不同的版本,它创建一个新的工作表,将 bat 文件放在一起,然后保存它。即使有 1,000 条记录,这也只需要几秒钟。

此脚本不会在 Word 中运行,因为它需要创建一个新的 Excel 工作表。可以更新脚本以使用 Word 文档。 .prn 格式有行数限制,所以我需要用“^”对命令进行换行

Sub gsPDF_Bat()
 'https://www.ghostscript.com/doc/current/VectorDevices.htm#distillerparams
    
    Dim ProofsFolder As String
    Dim CompressFolder As String
    Dim OrigSheet As String
    Dim exePath As String
    Dim CmdLine, CmdLine2, CmdLine3 As String
    
    exePath = "C:\Program Files\gs\gs9.54.0\bin\"

    ' Open the select folder prompt
    With Application.FileDialog(msoFileDialogFolderPicker)
        If .Show = -1 Then ' if OK is pressed
            ProofsFolder = .SelectedItems(1)
        End If
    End With
    
    If ProofsFolder <> "" Then ' if a file was chosen
        Debug.Print ProofsFolder
    End If
    With Application.FileDialog(msoFileDialogFolderPicker)
        If .Show = -1 Then ' if OK is pressed
            CompressFolder = .SelectedItems(1)
        End If
    End With
    
    If CompressFolder <> "" Then ' if a file was chosen
        Debug.Print CompressFolder
    End If
    
    Dim fso As Object
    Dim folder As Object
    Dim CurrFile As Object

  
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set folder = fso.GetFolder(ProofsFolder)
       
    cell = 0
    OrigSheet = ActiveSheet.Name
    
    Sheets.Add(After:=Sheets(Sheets.Count)).Name = "temp"
    Application.DisplayAlerts = False
    
    For Each CurrFile In folder.Files
        FName = CurrFile.Name
        CurrFileExt = Right(FName, 4)
            Debug.Print CurrFileExt

            If CurrFileExt = ".pdf" Then
                cell = cell + 1
                
                Debug.Print "CurrFile Found: " & CurrFile

                backNum = InStrRev(CurrFile, "\", -1)
                Debug.Print "backNum: " & backNum
                FName = Mid(CurrFile, (backNum + 1))
                Debug.Print FName
                
                ' ^ allows a line break on a DOS command
            CmdLine = exePath & "gswin64 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dAutoRotatePages=/None -dPDFSETTINGS=/prepress -dUseCIEColor -^"
            CmdLine2 = "sOutputFile=""" & CompressFolder & "\z" & FName & """^"
            CmdLine3 = " " & """" & CurrFile & """"

            Sheets("temp").Range("A" & cell).value = CmdLine
            cell = cell + 1
            Sheets("temp").Range("A" & cell).value = CmdLine2
            cell = cell + 1
            Sheets("temp").Range("A" & cell).value = CmdLine3    
            End If
    Next

    Sheets("temp").Select
    Sheets("temp").Copy
 
    ActiveWorkbook.SaveAs FileName:= _
        ProofsFolder & "\gsPDF Compress.bat", FileFormat:=xlTextPrinter, _
        CreateBackup:=False
    
    ActiveWorkbook.Close

    Sheets("temp").Delete
    Sheets(OrigSheet).Select

    Application.DisplayAlerts = True
    Set fso = Nothing
    Set folder = Nothing

End Sub