填充的PDF字段在不同的上下文中以不同的方式显示

时间:2019-07-03 18:44:21

标签: python-3.x pdf pypdf2

我有一个Python脚本,该脚本创建许多pdf形式(0-10),然后将它们连接为一种形式。在4个不同的上下文中,已编译PDF上的字段显示方式有所不同。我正在debian linux上进行开发,并且pdf查看器(Okular)未显示已编译PDF内的任何字段,而在Windows 10上,如果我使用chrome打开pdf,则必须将鼠标悬停在该字段上才能查看该字段的值。它具有第一页的正确字段数据,但是,随后的每个页面只是第一页的副本,这是不正确的。如果我使用Microsoft Edge打开pdf,它将正确显示每页的表单数据,但是当我使用Edge打印时,没有表单数据显示。

我正在使用pdfrw写入pdf,并使用pypdf2进行合并。我已经尝试了许多不同的方法,包括尝试使用python(对btw的支持很少)使pdf变平,读写而不是合并,尝试将表单域转换为文本以及许多其他方法,从那以后我就忘记了,因为它们没有用。

def writeToPdf(unfilled, output, data, fields):
    '''Function writes the data from data to unfilled, and saves it as output'''
    # TODO: Use literal declarations for lists, dicts, etc
    checkboxes = [
        'misconduct_complete',
        'misconduct_incomplete',
        'not_final_exam',
        'supervise_exam',
        'not_final_home_exam',
        'not_final_assignment',
        'not_final_oral_exam',
        'not_final_lab_exam',
        'not_final_practical_exam',
        'not_final_other'
    ]
    template_pdf = pdfrw.PdfReader(unfilled)
    annotations = template_pdf.pages[0][Annot_Key]
    for annotation in annotations:
        # TODO: Singly nested if's with no else's suggest a logic problem, find a clearer way to do this.
        if annotation[Subtype_Key] == Widget_Subtype_Key:
            if annotation[Annot_Field_Key]:
                key = annotation[Annot_Field_Key][1:-1]
                if key in fields:
                    if key in checkboxes:
                        annotation.update(pdfrw.PdfDict(AS=pdfrw.PdfName('Yes')))
                    else:
                        if(key == 'course'):
                            annotation.update(pdfrw.PdfDict(V='{}'.format(data[key][0:8])))
                        else:
                            annotation.update(pdfrw.PdfDict(V='{}'.format(data[key])))
    pdfrw.PdfWriter().write(output, template_pdf)


def set_need_appearances_writer(writer):
    # basically used to ensured there are not
    # overlapping form fields, which makes printing hard
    try:
        catalog = writer._root_object
        # get the AcroForm tree and add "/NeedAppearances attribute
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)


    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))

    return writer


def mergePDFs(listOfPdfPaths, outputPDf):
    '''Function Merges a list of pdfs into a single one, and saves it to outputPDf'''
    pdf_writer = PdfFileWriter()
    set_need_appearances_writer(pdf_writer)
    pdf_writer.setPageMode('/UseOC')

    for path in listOfPdfPaths:
        pdf_reader = PdfFileReader(path)
        for page in range(pdf_reader.getNumPages()):
            pdf_writer.addPage(pdf_reader.getPage(page))

    with open(outputPDf, 'wb') as fh:
        pdf_writer.write(fh)

如上所述,针对不同的上下文有不同的结果。在Debian Linux中,okular视图不显示任何表单,在Windows 10中,谷歌浏览器chrome在第一页之后显示重复的字段(但是我必须将鼠标悬停在该字段上/单击该字段),Microsoft Edge在每个页面上都有自己的字段数据来显示正确的内容,如果我查看打印预览,它也不会显示表单数据

0 个答案:

没有答案