Question

我需要将一个装满pdfs的文件夹合并到一个文件中。但是，它们必须按特定顺序组合。文件名的示例是：

WR_Mapbook__1.pdf  
WR_Mapbook__1a.pdf  
WR_Mapbook__2.pdf  
WR_Mapbook__2a.pdf  
WR_Mapbook__3.pdf  
WR_Mapbook__3a.pdf  
etc...

在Windows资源管理器中对它们进行排序的方式是我需要将它们添加到单个文件中的方式。但是我的脚本首先添加所有“a”文件，然后添加没有“a”的文件。为什么这样做？如何对其进行排序以便以我想要的方式添加文件？

请参阅下面的代码。谢谢！

from pyPdf import PdfFileWriter, PdfFileReader  
import glob

outputLoc = "K:\\test\\pdf_output\\"
output = PdfFileWriter()


pdfList = glob.glob(r"K:\test\lidar_MB_ALL\*.pdf")
pdfList.sort
print pdfList
for pdf in pdfList:
    print pdf
    input1 = PdfFileReader(file(pdf, "rb"))
    output.addPage(input1.getPage(0))
    # finally, write "output" to document-output.pdf
    outputStream = file(outputLoc + "WR_Imagery_LiDar_Mapbook.pdf", "wb")
    output.write(outputStream)
    print ("adding " + pdf)

 outputStream.close()

Answer 1

在pdfList.sort之后尝试put（），如下所示：

pdfList.sort()

你写它的方式实际上不会对列表进行排序。我抓住你的文件名列表将它们粘在一个数组中，然后按照你显示的顺序排序。

Answer 2

您需要的是实施"Natural Order String Comparison". 希望有人已经这样做并分享了它。

编辑：以下是在Python中执行此操作的强力示例。

import re

digits = re.compile(r'(\d+)')
def tokenize(filename):
    return tuple(int(token) if match else token
                 for token, match in
                 ((fragment, digits.search(fragment))
                  for fragment in digits.split(filename)))

# Now you can sort your PDF file names like so:
pdfList.sort(key=tokenize)

Answer 3

将pdfList.sort替换为

pdfList = sorted(pdfList, key = lambda x: x[:-4])

或

pdfList = sorted(pdfList, key = lambda x: x.rsplit('.', 1)[0])在排序时忽略文件扩展名

使用Python对文件列表进行排序

3 个答案: