Question

我在Windows 7上使用Python 3.4.2和PyPDF2 1.24（在有帮助的情况下也使用了reportlab 3.1.44）。

我最近从Python 2.7升级到3.4，并且正在移植我的代码。此代码用于创建一个空白的pdf页面，其中嵌入了链接（使用reportlab）并将其（使用PyPDF2）与现有的pdf页面合并。我在reportlab中遇到了一个问题，即保存画布使用了需要更改为BytesIO的StringIO，但在这之后我遇到了这个错误：

Traceback (most recent call last):
File "C:\cms_software\pdf_replica\builder.py", line 401, in merge_pdf_files
    input_page.mergePage(link_page)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2013, in mergePage
    self.mergePage(page2)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2059, in mergePage
    page2Content = PageObject._pushPopGS(page2Content, self.pdf)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1973, in _pushPopGS
    stream = ContentStream(contents, pdf)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2446, in __init
    stream = BytesIO(b_(stream.getData()))
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 826, in getData
    decoded._data = filters.decodeStreamData(self)
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 326, in decodeStreamData
    data = ASCII85Decode.decode(data)
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 264, in decode
    data = [y for y in data if not (y in ' \n\r\t')]
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 264, in 
    data = [y for y in data if not (y in ' \n\r\t')]
TypeError: 'in <string>' requires string as left operand, not int

以下是追溯提及的上面一行和上面的行：

link_page = self.make_pdf_link_page(pdf, size, margin, scale_factor, debug_article_links)
if link_page != None:
input_page.mergePage(link_page)

以下是make_pdf_link_page函数的相关部分：

packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=(size['width'], size['height']))
....# left out code here is just reportlab specifics for size and url stuff
can.linkURL(url, r1, thickness=1, color=colors.green)
can.rect(x1, y1, width, height, stroke=1, fill=0)
# create a new PDF with Reportlab that has the url link embedded
can.save()
packet.seek(0)
try:
    new_pdf = PdfFileReader(packet)
except Exception as e:
    logger.exception('e')
    return None
return new_pdf.getPage(0)

我认为使用BytesIO会出现问题，但我无法使用带有StringIO的reportlab创建页面。这是一个过去与Python 2.7完美配合的关键功能，所以我很欣赏任何类型的反馈。谢谢！

更新：我也试过从使用BytesIO转换到只写入临时文件，然后合并。不幸的是我得到了同样的错误。这是tempfile版本：

import tempfile
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, "tmp.pdf")
can = canvas.Canvas(temp_path, pagesize=(size['width'], size['height']))
....
can.showPage()
can.save()
try:
    new_pdf = PdfFileReader(temp_path)
except Exception as e:
    logger.exception('e')
    return None
return new_pdf.getPage(0)

更新：我发现了一些有趣的信息。似乎我注释掉can.rect和can.linkURL调用它将合并。因此，在页面上绘制任何内容，然后尝试将其与现有的pdf合并，都会导致错误。

Answer 1

在深入研究PyPDF2库代码后，我找到了自己的答案。对于python 3用户，旧库可能很棘手。即使他们说他们支持python 3，他们也不一定要测试一切。在这种情况下，问题出在PyPDF2中的filters.py中的ASCII85Decode类。对于python 3，此类需要返回字节。我从pdfminer3k借用了相同类型函数的代码，pdfminer3k是pdfminer的python 3的一个端口。如果您为此代码交换ASCII85Decode（）类，它将起作用：

import struct
class ASCII85Decode(object):
    def decode(data, decodeParms=None):
        if isinstance(data, str):
            data = data.encode('ascii')
        n = b = 0
        out = bytearray()
        for c in data:
            if ord('!') <= c and c <= ord('u'):
                n += 1
                b = b*85+(c-33)
                if n == 5:
                    out += struct.pack(b'>L',b)
                    n = b = 0
            elif c == ord('z'):
                assert n == 0
                out += b'\0\0\0\0'
            elif c == ord('~'):
                if n:
                    for _ in range(5-n):
                        b = b*85+84
                    out += struct.pack(b'>L',b)[:n-1]
                break
        return bytes(out)

移植到Python3：PyPDF2 mergePage（）给出TypeError

1 个答案: