从python或bash批量填写PDF表单

时间:2012-05-07 03:13:53

标签: python forms pdf automation

我有一个PDF表单,需要多次填写(确切地说是时间表)。现在因为我不想手动执行此操作,我正在寻找一种方法来使用python脚本或可以在bash脚本中使用的工具来填充它们。

有没有人有这方面的经验?

4 个答案:

答案 0 :(得分:13)

对于Python,你需要fdfgen lib和pdftk

@Hugh Bothwell的评论是100%正确的,所以我会用一个有效的实现来扩展答案。

如果你在Windows中,你还需要确保系统路径中包含python和pdftk(除非你想使用长文件夹名称)。

以下是从CSV数据文件中自动批量填充PDF表单集合的代码:

import csv
from fdfgen import forge_fdf
import os
import sys

sys.path.insert(0, os.getcwd())
filename_prefix = "NVC"
csv_file = "NVC.csv"
pdf_file = "NVC.pdf"
tmp_file = "tmp.fdf"
output_folder = './output/'

def process_csv(file):
    headers = []
    data =  []
    csv_data = csv.reader(open(file))
    for i, row in enumerate(csv_data):
      if i == 0:
        headers = row
        continue;
      field = []
      for i in range(len(headers)):
        field.append((headers[i], row[i]))
      data.append(field)
    return data

def form_fill(fields):
  fdf = forge_fdf("",fields,[],[],[])
  fdf_file = open(tmp_file,"w")
  fdf_file.write(fdf)
  fdf_file.close()
  output_file = '{0}{1} {2}.pdf'.format(output_folder, filename_prefix, fields[1][1])
  cmd = 'pdftk "{0}" fill_form "{1}" output "{2}" dont_ask'.format(pdf_file, tmp_file, output_file)
  os.system(cmd)
  os.remove(tmp_file)

data = process_csv(csv_file)
print('Generating Forms:')
print('-----------------------')
for i in data:
  if i[0][1] == 'Yes':
    continue
  print('{0} {1} created...'.format(filename_prefix, i[1][1]))
  form_fill(i)

注意:要弄清楚如何定制它,不应该是火箭手术。初始变量声明包含自定义配置。

在CSV中,在第一行中,每列将包含PDF文件中相应字段名称的名称。任何在模板中没有相应字段的列都将被忽略。

在PDF模板中,只需创建您希望填充数据的可编辑字段,并确保名称与CSV数据匹配。

对于此特定配置,只需将此文件放在与NVC.csv,NVC.pdf和名为“output”的文件夹相同的文件夹中。运行它并自动完成剩下的工作。

答案 1 :(得分:8)

更快的版本,不需要pdftk和fdfgen,纯Python 3.6 +:

# -*- coding: utf-8 -*-

from collections import OrderedDict
from PyPDF2 import PdfFileWriter, PdfFileReader


def _getFields(obj, tree=None, retval=None, fileobj=None):
    """
    Extracts field data if this PDF contains interactive form fields.
    The *tree* and *retval* parameters are for recursive use.

    :param fileobj: A file object (usually a text file) to write
        a report to on all interactive form fields found.
    :return: A dictionary where each key is a field name, and each
        value is a :class:`Field<PyPDF2.generic.Field>` object. By
        default, the mapping name is used for keys.
    :rtype: dict, or ``None`` if form data could not be located.
    """
    fieldAttributes = {'/FT': 'Field Type', '/Parent': 'Parent', '/T': 'Field Name', '/TU': 'Alternate Field Name',
                       '/TM': 'Mapping Name', '/Ff': 'Field Flags', '/V': 'Value', '/DV': 'Default Value'}
    if retval is None:
        retval = OrderedDict()
        catalog = obj.trailer["/Root"]
        # get the AcroForm tree
        if "/AcroForm" in catalog:
            tree = catalog["/AcroForm"]
        else:
            return None
    if tree is None:
        return retval

    obj._checkKids(tree, retval, fileobj)
    for attr in fieldAttributes:
        if attr in tree:
            # Tree is a field
            obj._buildField(tree, retval, fileobj, fieldAttributes)
            break

    if "/Fields" in tree:
        fields = tree["/Fields"]
        for f in fields:
            field = f.getObject()
            obj._buildField(field, retval, fileobj, fieldAttributes)

    return retval


def get_form_fields(infile):
    infile = PdfFileReader(open(infile, 'rb'))
    fields = _getFields(infile)
    return OrderedDict((k, v.get('/V', '')) for k, v in fields.items())


def update_form_values(infile, outfile, newvals=None):
    pdf = PdfFileReader(open(infile, 'rb'))
    writer = PdfFileWriter()

    for i in range(pdf.getNumPages()):
        page = pdf.getPage(i)
        try:
            if newvals:
                writer.updatePageFormFieldValues(page, newvals)
            else:
                writer.updatePageFormFieldValues(page,
                                                 {k: f'#{i} {k}={v}'
                                                  for i, (k, v) in enumerate(get_form_fields(infile).items())
                                                  })
            writer.addPage(page)
        except Exception as e:
            print(repr(e))
            writer.addPage(page)

    with open(outfile, 'wb') as out:
        writer.write(out)


if __name__ == '__main__':
    from pprint import pprint

    pdf_file_name = '2PagesFormExample.pdf'

    pprint(get_form_fields(pdf_file_name))

    update_form_values(pdf_file_name, 'out-' + pdf_file_name)  # enumerate & fill the fields with their own names
    update_form_values(pdf_file_name, 'out2-' + pdf_file_name,
                       {'my_fieldname_1': 'My Value',
                        'my_fieldname_2': 'My Another alue'})  # update the form fields

答案 2 :(得分:0)

替换原始文件

os.system('pdftk "original.pdf" fill_form "data.fdf" output "output.pdf"')
os.remove("data.fdf")
os.remove("original.pdf")
os.rename("output.pdf","original.pdf")

答案 3 :(得分:0)

我编写了一个基于以下内容构建的库:'pdfrw'、'pdf2image'、'Pillow'、'PyPDF2',称为 fillpdf(pip install fillpdf 和 poppler 依赖项 conda install -c conda-forge poppler

基本用法:

from fillpdf import fillpdfs

fillpdfs.get_form_fields("blank.pdf")

# returns a dictionary of fields
# Set the returned dictionary values a save to a variable
# For radio boxes ('Off' = not filled, 'Yes' = filled)

data_dict = {
'Text2': 'Name',
'Text4': 'LastName',
'box': 'Yes',
}

fillpdfs.write_fillable_pdf('blank.pdf', 'new.pdf', data_dict)

# If you want it flattened:
fillpdfs.flatten_pdf('new.pdf', 'newflat.pdf')

更多信息在这里: https://github.com/t-houssian/fillpdf

如果某些字段没有填写,可以使用 fitz (pip install PyMuPDF) 和 PyPDF2 (pip install PyPDF2) 如下所示根据需要更改点:

import fitz
from PyPDF2 import PdfFileReader

file_handle = fitz.open('blank.pdf')
pdf = PdfFileReader(open('blank.pdf','rb'))
box = pdf.getPage(0).mediaBox
w = box.getWidth()
h = box.getHeight()

# For images
image_rectangle = fitz.Rect((w/2)-200,h-255,(w/2)-100,h-118)
pages = pdf.getNumPages() - 1
last_page = file_handle[pages]
last_page._wrapContents()
last_page.insertImage(image_rectangle, filename=f'image.png')

# For text
last_page.insertText(fitz.Point((w/2)-247 , h-478), 'John Smith', fontsize=14, fontname="times-bold")
file_handle.save(f'newpdf.pdf')