Question

我认为这必须是记忆问题，但我不确定。该程序循环访问PDF以查找损坏的文件。当文件损坏时，它会将该位置写入txt文件供我稍后查看。第一次运行时，我将通过和失败的方案记录到日志中。在67381个日志条目之后，它停止了。然后我更改了这个逻辑，所以它只记录错误，但是，在控制台中我确实显示了一个循环计数，所以我可以告诉我们这个过程有多远。有大约190k文件循环，正好67381每次都停止计数。看起来python程序仍在后台运行，因为内存和CPU不断波动，但很难确定。我现在还不知道它是否仍会在日志中写错误。

这是代码，

import PyPDF2, os
from time import gmtime,strftime

path = raw_input("Enter folder path of PDF files:")
t = open(r'c:\pdf_check\log.txt','w')
count = 1
for dirpath,dnames,fnames in os.walk(path):
    for file in fnames:
        print count
        count = count + 1
        if file.endswith(".pdf"):
            file = os.path.join(dirpath, file)
            try:
                PyPDF2.PdfFileReader(open(file, "rb"))
            except PyPDF2.utils.PdfReadError:
                curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
                t.write (str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "fail" + "\n")
            else:
                pass
                #curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
                #t.write(str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "pass" + "\n")

t.close()

编辑1 :(新代码）新代码和同样的问题：

import PyPDF2, os
from time import gmtime,strftime

path = raw_input("Enter folder path of PDF files:")
t = open(r'c:\pdf_check\log.txt','w')
count = 1
for dirpath,dnames,fnames in os.walk(path):
    for file in fnames:
        print count
        count = count + 1
        if file.endswith(".pdf"):
            file = os.path.join(dirpath, file)
            try:
                with open(file,'rb') as f:
                    PyPDF2.PdfFileReader(f)
            except PyPDF2.utils.PdfReadError:
                curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
                t.write (str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "fail" + "\n")
                f.close()
            else:
                pass
                f.close()
                #curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
                #t.write(str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "pass" + "\n")

t.close()

编辑2：我现在尝试从另一台具有更强硬件和不同版本的Windows（10 pro而不是server 2008 r2）的机器上运行它，但我不认为这是问题。

Answer 1

尝试编辑其中一个.pdf文件以使其更大。这样，如果循环编号你的程序＆＃34;停止＆＃34;在较小的情况下，您可以将问题识别为内存问题。

否则，它可能是一个异常大型pdf文件，需要您的程序一段时间来验证完整性。

调试此项，您可以打印您打开的.pdf文件的文件位置，以找到这个特定的.pdf并手动打开它以进一步调查..

Answer 2

想出来。问题实际上是由于随机且非常大的损坏的PDF。所以这不是循环问题，它是一个损坏的文件问题。

Python循环计数停在67381

2 个答案: