将PDF转换/写入为RAM,就像文件一样,以便进一步使用它

时间:2019-07-07 19:02:41

标签: python python-3.x pypdf2

我的脚本生成PDF(PyPDF2.pdf.PdfFileWriter object)并将其存储在变量中。 我需要在脚本中进一步像file-like object一样使用它。但是现在我必须先将其写入HDD。然后,我必须将其打开为文件才能使用。

为了防止这种不必要的写/读操作,我发现了许多解决方案-StringIOBytesIO等。但我找不到能为我提供帮助的东西。

据我了解-我需要将PyPDF2.pdf.PdfFileWriter object转换(或写入RAM)file-like object才能直接使用它。

还是有一种完全适合我情况的方法?

更新-这是代码示例

from pdfrw import PdfReader, PdfWriter, PageMerge
from PyPDF2 import PdfFileReader, PdfFileWriter


red_file = PdfFileReader(open("file_name.pdf", 'rb'))

large_pages_indexes = [1, 7, 9]

large = PdfFileWriter()
for i in large_pages_indexes:
    p = red_file.getPage(i)
    large.addPage(p)

# here final data have to be written (I would like to avoid that)
with open("virtual_file.pdf", 'wb') as tmp:
  large.write(tmp)

# here I need to read exported "virtual_file.pdf" (I would like to avoid that too)
with open("virtual_file.pdf", 'rb') as tmp:
  pdf = PdfReader(tmp) # here I'm starting to work with this file using another module "pdfrw"
  print(pdf)

1 个答案:

答案 0 :(得分:2)

为避免磁盘I / O变慢,您似乎要替换

with open("virtual_file.pdf", 'wb') as tmp:
  large.write(tmp)

with open("virtual_file.pdf", 'rb') as tmp:
  pdf = PdfReader(tmp)

使用

buf = io.BytesIO()
large.write(buf)
buf.seek(0)
pdf = PdfReader(buf)

此外,buf.getvalue()可供您使用。