Question

我们遇到的PDF文件是包含用户提供的数据的可填写PDF表单。我们希望将用户填写的数据提取到表单中，但不知道存在此功能的任何宝石。因此，例如，PDF格式中有一个字段，用于＆＃34;名字＆＃34;用户已经完成了＃34; David＆＃34; - 我们想要掌握数据，例如＆＃34;名字=＆gt;大卫＆＃34;

查看示例文件的属性告诉我：

PDF制片人：Adobe LiveCycle Designer ES 8.2
PDF版本：1.7，Adobe Extension Level 3（Acrobat 9.x）

建议和想法赞赏！

由于

Answer 1

由于dump_data_fields方法具有非常标准化的结构，因此该方法应该适用于您需要它将输出一个Array，每个字段都是一个哈希对象。

def parse_pdf_dump(file)
  file = open(file,&:read)
  fields = file.split("---").delete_if{|f| f.empty?}
  #Create an Array of the fields 
  fields.map do |field|
    #Create a have of attribute => value for each field attribute
    Hash[
       field.split("\n").map do |line|
            split_line = line.split(":")
            #grab the name of the attribute
            name = split_line.shift
            #grab the value of the attribute
            #join is used in the case that the data originally had a : in it
            val = split_line.join(":")
            unless f_name.nil?
             [name.downcase, val.strip]
            end
       end
    ]
  end
end

使用active_pdftk

调用如下所示

require 'active_pdftk'
output_path = '/data_fields.txt'
pdftk = ActivePdftk::Wrapper.new(:path => [YOUR PATH TO PDFTK BINARY OR EXE])
pdftk.dump_data_fields([YOUR PDF], :output => output_path)
fields_array = parse_pdf_dump(output_path)
%x( rm output_path)

因此，您将使用pdftk将数据字段转储到数组fields_array中，然后删除文本文件。

Ruby gem从可填写的PDF中提取表单数据

1 个答案: