Question

对于我的工作，我经常不得不按所有者名称合并PDF。我有看起来像这样的PDF名称。

TX-BRA-OO12NGA-399.00-约翰·多伊
TX-BRA-OO12NGA-400.00-约翰·多伊
TX-BRA-OO12NGA-450.00-鲍勃
TX-BRA-OO12NGA-451.00-比尔

您可以想象此列表长100页，并且有几个相同的名称。同时，有几个独立的名称。我的目标是合并相同的名称，并在合并的pdf名称中保留前15个字符（TX-BRA-OO12NGA-）和其后的数字，并用逗号分隔。使用上面提供的简短列表，最终输出的示例应为：

TX-BRA-OO12NGA-399.00，400.00-约翰·多伊
TX-BRA-OO12NGA-450.00-鲍勃
TX-BRA-OO12NGA-451.00-比尔

其中“ TX-BRA-OO12NGA-399.00，400.00-John Doe”是以下文件的合并pdf：

TX-BRA-OO12NGA-399.00-约翰·多伊
TX-BRA-OO12NGA-400.00-John Doe

我知道我的代码可能非常粗糙...我是python编码的新手。但是对此我能提供的任何帮助将不胜感激。

def main():

    #geektechstuff
    #Python script to merge multiple PDF files into one PDF

    #Requires the “PyPDF2” and “OS” modules to be imported
    import os, PyPDF2

    #Ask user where the PDFs are
    userpdflocation=input('Folder path to PDFs that need merging')

    #Sets the scripts working directory to the location of the PDFs
    os.chdir(userpdflocation)

    #Ask user for the name to save the file as


    #Get all the PDF filenames
    pdf2merge = []
    for filename in os.listdir('.'):
        if filename.endswith('.pdf') and (filename[:-14], filename[:-14]):
            pdf2merge.append(filename)
        name = (filename[:16-21]) + ", " + (filename[:16-21])
        name2 = (filename[:25-100])
        pdfWriter = PyPDF2.PdfFileWriter()

        #loop through all PDFs
        for filename in pdf2merge:
        #rb for read binary
            pdfFileObj = open(filename,'rb')
            pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
        #Opening each page of the PDF
            for pageNum in range(pdfReader.numPages):
                pageObj = pdfReader.getPage(pageNum)
                pdfWriter.addPage(pageObj)
        #save PDF to file, wb for write binary
        pdfOutput = open((filename[:1-15]) + name + name2 + '.pdf', 'wb')
        #Outputting the PDF
        pdfWriter.write(pdfOutput)
        #Closing the PDF writer
        pdfOutput.close()




if __name__ == '__main__':
    main()

通过拆分文件名称合并多个pdf

0 个答案: