我有一个Python脚本,该脚本创建许多pdf形式(0-10),然后将它们连接为一种形式。在4个不同的上下文中,已编译PDF上的字段显示方式有所不同。我正在debian linux上进行开发,并且pdf查看器(Okular)未显示已编译PDF内的任何字段,而在Windows 10上,如果我使用chrome打开pdf,则必须将鼠标悬停在该字段上才能查看该字段的值。它具有第一页的正确字段数据,但是,随后的每个页面只是第一页的副本,这是不正确的。如果我使用Microsoft Edge打开pdf,它将正确显示每页的表单数据,但是当我使用Edge打印时,没有表单数据显示。
我正在使用pdfrw写入pdf,并使用pypdf2进行合并。我已经尝试了许多不同的方法,包括尝试使用python(对btw的支持很少)使pdf变平,读写而不是合并,尝试将表单域转换为文本以及许多其他方法,从那以后我就忘记了,因为它们没有用。
def writeToPdf(unfilled, output, data, fields):
'''Function writes the data from data to unfilled, and saves it as output'''
# TODO: Use literal declarations for lists, dicts, etc
checkboxes = [
'misconduct_complete',
'misconduct_incomplete',
'not_final_exam',
'supervise_exam',
'not_final_home_exam',
'not_final_assignment',
'not_final_oral_exam',
'not_final_lab_exam',
'not_final_practical_exam',
'not_final_other'
]
template_pdf = pdfrw.PdfReader(unfilled)
annotations = template_pdf.pages[0][Annot_Key]
for annotation in annotations:
# TODO: Singly nested if's with no else's suggest a logic problem, find a clearer way to do this.
if annotation[Subtype_Key] == Widget_Subtype_Key:
if annotation[Annot_Field_Key]:
key = annotation[Annot_Field_Key][1:-1]
if key in fields:
if key in checkboxes:
annotation.update(pdfrw.PdfDict(AS=pdfrw.PdfName('Yes')))
else:
if(key == 'course'):
annotation.update(pdfrw.PdfDict(V='{}'.format(data[key][0:8])))
else:
annotation.update(pdfrw.PdfDict(V='{}'.format(data[key])))
pdfrw.PdfWriter().write(output, template_pdf)
def set_need_appearances_writer(writer):
# basically used to ensured there are not
# overlapping form fields, which makes printing hard
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
def mergePDFs(listOfPdfPaths, outputPDf):
'''Function Merges a list of pdfs into a single one, and saves it to outputPDf'''
pdf_writer = PdfFileWriter()
set_need_appearances_writer(pdf_writer)
pdf_writer.setPageMode('/UseOC')
for path in listOfPdfPaths:
pdf_reader = PdfFileReader(path)
for page in range(pdf_reader.getNumPages()):
pdf_writer.addPage(pdf_reader.getPage(page))
with open(outputPDf, 'wb') as fh:
pdf_writer.write(fh)
如上所述,针对不同的上下文有不同的结果。在Debian Linux中,okular视图不显示任何表单,在Windows 10中,谷歌浏览器chrome在第一页之后显示重复的字段(但是我必须将鼠标悬停在该字段上/单击该字段),Microsoft Edge在每个页面上都有自己的字段数据来显示正确的内容,如果我查看打印预览,它也不会显示表单数据