Question

我正在用python编写代码，以便下载pdf文件和png文件，然后从文件中提取信息。但是，我必须使其与Microsoft-Azure兼容，这不允许我在读取文件之前保存文件。有什么简单的方法可以在我读取文件时将它们保存在内存中，而不必保存文件？

Answer 1

我找到了使用此类临时文件的解决方案。我与花粉顺便说一句。

with tempfile.TemporaryFile() as fp:
    for chunk in pollenfile.iter_content(chunk_size=1024):
        # writing one chunk at a time to pdf file, because pdf is so large
        if chunk:
            fp.write(chunk)

    pdfReader = PyPDF2.PdfFileReader(
        fp
    )  # Discerning the number of pages will allow us to parse through all the pages.
    num_pages = pdfReader.numPages
    count = 0
    pollen_txt = ""  # The while loop will read each page.
    while count < num_pages:
        pageObj = pdfReader.getPage(count)
        count += 1
        pollen_txt += (
            pageObj.extractText()
        )

通过这样做，我以字符串格式提取了pdf，而没有将其保存到计算机中，以后可以对其进行操作。

有没有一种方法可以在python中处理文件而无需下载文件

1 个答案: