有没有办法关闭文件PdfFileReader打开?

时间:2017-10-30 17:38:19

标签: python python-2.7 pypdf2

我打开了很多PDF,我想在解析后删除PDF文件,但文件在程序运行完毕之前一直保持打开状态。如何关闭我使用PyPDF2打开的PDF?

代码:

def getPDFContent(path):
    content = ""
    # Load PDF into pyPDF
    pdf = PyPDF2.PdfFileReader(file(path, "rb"))

    #Check for number of pages, prevents out of bounds errors
    max = 0
    if pdf.numPages > 3:
        max = 3
    else:
        max = (pdf.numPages - 1)

    # Iterate pages
    for i in range(0, max): 
        # Extract text from page and add to content
        content += pdf.getPage(i).extractText() + "\n"
    # Collapse whitespace
    content = " ".join(content.replace(u"\xa0", " ").strip().split())
    #pdf.close()
    return content

3 个答案:

答案 0 :(得分:2)

自己打开并关闭文件

f = open(path, "rb")
pdf = PyPDF2.PdfFileReader(f)
f.close()

PyPDF2 .read()是您在构造函数中传入的流。所以在初始对象构造之后,你可以抛出文件。

上下文管理器也可以工作:

with open(path, "rb") as f:
    pdf = PyPDF2.PdfFileReader(f)
do_other_stuff_with_pdf(pdf)

答案 1 :(得分:1)

这样做时:

pdf = PyPDF2.PdfFileReader(file(path, "rb"))

您正在设置对句柄的引用,但您无法控制文件何时关闭。

你应该用句柄创建一个上下文,而不是从这里匿名传递它:

我会写

with open(path,"rb") as f:

    pdf = PyPDF2.PdfFileReader(f)
    #Check for number of pages, prevents out of bounds errors
    ... do your processing
    # Collapse whitespace
    content = " ".join(content.replace(u"\xa0", " ").strip().split())
# now the file is closed by exiting the block, you can delete it
os.remove(path)
# and return the contents
return content

答案 2 :(得分:1)

是的,您正在将流传递给PdfFileReader,您可以将其关闭。 <value type="string" key="wb.admin.export.option:set-gtid-purged">OFF</value> 语法最适合您:

with