对于我的工作,我经常不得不按所有者名称合并PDF。我有看起来像这样的PDF名称。
TX-BRA-OO12NGA-399.00-约翰·多伊
TX-BRA-OO12NGA-400.00-约翰·多伊
TX-BRA-OO12NGA-450.00-鲍勃
TX-BRA-OO12NGA-451.00-比尔
您可以想象此列表长100页,并且有几个相同的名称。同时,有几个独立的名称。我的目标是合并相同的名称,并在合并的pdf名称中保留前15个字符(TX-BRA-OO12NGA-)和其后的数字,并用逗号分隔。使用上面提供的简短列表,最终输出的示例应为:
TX-BRA-OO12NGA-399.00,400.00-约翰·多伊
TX-BRA-OO12NGA-450.00-鲍勃
TX-BRA-OO12NGA-451.00-比尔
其中“ TX-BRA-OO12NGA-399.00,400.00-John Doe”是以下文件的合并pdf:
TX-BRA-OO12NGA-399.00-约翰·多伊
TX-BRA-OO12NGA-400.00-John Doe
我知道我的代码可能非常粗糙...我是python编码的新手。但是对此我能提供的任何帮助将不胜感激。
def main():
#geektechstuff
#Python script to merge multiple PDF files into one PDF
#Requires the “PyPDF2” and “OS” modules to be imported
import os, PyPDF2
#Ask user where the PDFs are
userpdflocation=input('Folder path to PDFs that need merging')
#Sets the scripts working directory to the location of the PDFs
os.chdir(userpdflocation)
#Ask user for the name to save the file as
#Get all the PDF filenames
pdf2merge = []
for filename in os.listdir('.'):
if filename.endswith('.pdf') and (filename[:-14], filename[:-14]):
pdf2merge.append(filename)
name = (filename[:16-21]) + ", " + (filename[:16-21])
name2 = (filename[:25-100])
pdfWriter = PyPDF2.PdfFileWriter()
#loop through all PDFs
for filename in pdf2merge:
#rb for read binary
pdfFileObj = open(filename,'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
#Opening each page of the PDF
for pageNum in range(pdfReader.numPages):
pageObj = pdfReader.getPage(pageNum)
pdfWriter.addPage(pageObj)
#save PDF to file, wb for write binary
pdfOutput = open((filename[:1-15]) + name + name2 + '.pdf', 'wb')
#Outputting the PDF
pdfWriter.write(pdfOutput)
#Closing the PDF writer
pdfOutput.close()
if __name__ == '__main__':
main()