Question

我正在尝试使用PyPDF模块创建Python脚本。脚本执行的操作是“Root”文件夹，合并其中的所有PDF并在“Output”文件夹中输出合并的PDF并将其重命名为“Root.pdf”（包含拆分PDF的文件夹）。它的作用是对子目录做同样的操作，给最终输出一个等于子目录的名称。

我来处理子目录时遇到困难，给我一些与一些十六进制值相关的错误代码。（似乎它得到一个不是十六进制的空值）

以下是生成的错误代码：

    Traceback (most recent call last):
  File "C:\Documents and Settings\student3\Desktop\Test\pdfMergerV1.py", line 76, in <module>
    files_recursively(path)
  File "C:\Documents and Settings\student3\Desktop\Test\pdfMergerV1.py", line 74, in files_recursively
    os.path.walk(path, process_file, ())
  File "C:\Python27\lib\ntpath.py", line 263, in walk
    walk(name, func, arg)
  File "C:\Python27\lib\ntpath.py", line 259, in walk
    func(arg, top, names)
  File "C:\Documents and Settings\student3\Desktop\Test\pdfMergerV1.py", line 38, in process_file
    pdf = PdfFileReader(file( filename, "rb"))
  File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 374, in __init__
    self.read(stream)
  File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 775, in read
    newTrailer = readObject(stream, self)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 67, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 531, in readFromStream
    value = readObject(stream, pdf)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 58, in readObject
    return ArrayObject.readFromStream(stream, pdf)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 153, in readFromStream
    arr.append(readObject(stream, pdf))
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 69, in readObject
    return readHexStringFromStream(stream)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 276, in readHexStringFromStream
    txt += chr(int(x, base=16))
ValueError: invalid literal for int() with base 16: '\x00\x00'

这是脚本的源代码：

 #----------------------------------------------------------------------------------------------
# Name:        pdfMerger
# Purpose:     Automatic merging of all PDF files in a directory and its sub-directories and
#              rename them according to the folder itself. Requires the pyPDF Module
#
# Current:     Processes all the PDF files in the current directory
# To-Do:       Process the sub-directories.
#
# Version: 1.0
# Author:      Brian Livori
#
# Created:     03/08/2011
# Copyright:   (c) Brian Livori 2011
# Licence:     Open-Source
#---------------------------------------------------------------------------------------------
#!/usr/bin/env <strong class="highlight">python</strong>

import os
import glob
import sys
import fnmatch

from pyPdf import PdfFileReader, PdfFileWriter

output = PdfFileWriter()

path = str(os.getcwd())

x = 0

def process_file(_, path, filelist):
    for filename in filelist:
        if filename.endswith('.pdf'):

            filename = os.path.join(path, filename)
            print "Merging " + filename

            pdf = PdfFileReader(file( filename, "rb"))

            x = pdf.getNumPages()

            i = 0

            while (i != x):

                output.addPage(pdf.getPage(i))
                print "Merging page: " + str(i+1) + "/" + str(x)

                i += 1

                output_dir = "\Output\\"

                ext = ".pdf"
                dir =  os.path.basename(path)
                outputpath = str(os.getcwd()) + output_dir
                final_output = outputpath

                if os.path.exists(final_output) != True:

                                os.mkdir(final_output)
                                outputStream = file(final_output + dir + ext, "wb")
                                os.path.join(outputStream)
                                output.write(outputStream)
                                outputStream.close()

                else:

                                outputStream = file(final_output + dir + ext, "wb")
                                os.path.join(outputStream)
                                output.write(outputStream)
                                outputStream.close()

def files_recursively(topdir):
        os.path.walk(path, process_file, ())

files_recursively(path)

Answer 1

您正在阅读的PDF文件看起来不是有效的PDF文件，或者它们比PyPDF准备的更具异国情调。你确定要阅读好的PDF文件吗？

此外，您的代码中有一些奇怪的东西，但这个可能真的很重要：

output_dir = "\Output\\"

你有一个\O转义序列，这不是你想要的。

运行python脚本时出现问题（pypdf / hex错误）

1 个答案: