Question

我正在构建一个处理pdf文件数据的系统（我使用PyPDF2 lib）。我现在获得base64编码的PDF，我可以使用以下代码正确解码和存储：

import base64
# base64FileData  <= the base64 file data
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
with open('thefilename.pdf', 'w') as theFile:
    theFile.write(fileData)

我现在想将此fileData用作二进制文件进行拆分，但当我type(fileData)时，fileData原来是<type 'str'>。如何将此fileData转换为二进制（或至少不是字符串）？

欢迎所有提示！

[编辑]

如果我open(fileData, 'rb')我收到错误，说

TypeError：file（）参数1必须是没有NULL字节的编码字符串，而不是str

要删除我尝试的空字节fileData.rstrip(' \t\r\n\0')和fileData.rstrip('\0')以及fileData.partition(b'\0')[0]，但似乎没有任何效果。有什么想法吗？

[EDIT2]

问题是我将此字符串传递给PyPDF2 PdfFileReader class，lines 909 to 912执行以下操作（其中stream是我提供的fileData）：

if type(stream) in (string_type, str):
    fileobj = open(stream, 'rb')
    stream = BytesIO(b_(fileobj.read()))
    fileobj.close()

因为它是一个字符串，它假定它是一个文件名，之后它会尝试打开该文件。然后，TypeError失败。因此，在将fileData提供给PdfFileReader之前，我需要以某种方式将其转换为str之外的其他内容，以便它不会尝试打开它，而只是将fileData视为文件就自己而言。有什么想法吗？

Answer 1

因此，开放的binary mode你必须使用'wb'，否则它基本上会被保存为“文本”。

import base64
# base64FileData  <= the base64 file data
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
with open('thefilename.pdf', 'wb') as theFile:
    theFile.write(fileData)

Answer 2

示例您的输入数据来自此：

with open(local_image_path, "rb") as imageFile:
    str_image_data = base64.b64encode(imageFile.read())

然后获取二进制变量，您可以尝试：

import io
import base64

binary_image_data = io.BytesIO(base64.decodebytes(str_image_data))

如何在Python中将base64文件解码为二进制文件？

2 个答案: