Question

我有一个Linux Web服务器，正在创建一个与pdf文件一起使用的php应用程序。目前我正在使用pdftk来读取上传到我的服务器的所有字段名称和字段类型的pdf。

一切都很好但我现在遇到的问题是我无法分辨文本字段的格式类别（也称为数据类型）。如果它设置为“无”，那么它就像普通文本字段一样，我可以为其分配任何文本，并且不会有任何问题。但是一旦某人选择了不同的价值，我就会遇到问题。

例如，格式类别为“Number”的文本字段只允许为其分配数字。如果我尝试向其发送文本，该字段就不会显示它。同样的问题与格式类别的“百分比”，“日期”，“时间”，“特殊”和“自定义”有关。必须有一些方法来确定它是什么，以便我可以让用户确切地知道他们可以输入到该字段中的数据类型。

Answer 1

字段根据其AA（附加操作）条目进行验证和/或格式化，这几乎是JavaScript函数调用。遗憾的是，pdftk会忽略此条目，并且不会将其导出到FDF（generate_fdf）和文本（dump_data_fields）。与pdftk的来源混淆超出我的能力。如果您找不到任何从PDF表单中提取此类信息的现成解决方案，您可以使用API编写一个小程序，该API可以访问所有字段的属性或低级别的COS结构。 E.g。

$ perl -Mstrict -MCAM::PDF -we '
my $doc = CAM::PDF->new($ARGV[0]) or die();
for ($doc->getFormFieldList()) {
    my $field = $doc->getFormField($_);
    next unless defined $field;
    my $dict = $doc->getValue($field);
    next unless exists $$dict{FT} and 
        $doc->getValue($$dict{FT}) eq "Tx";
    print "Field \"$_\" ";
    my $AA = $doc->getValue($$dict{AA});
    my $F = $doc->getValue($$AA{F});
    my $JS = $doc->getValue($$F{JS});
    print $JS ? "is formatted as \"$JS\"\n" : "is a plain text\n";
}
' MIVoterRegistration_97046_7.pdf

抱歉，例如PDF :-)，这是我检查过的第一个快速谷歌结果，包含AA。输出是

Field "LastName" is a plain text
Field "FirstName" is a plain text
Field "Middle Name" is a plain text
Field "Address" is a plain text
Field "Apart#" is a plain text
Field "City" is a plain text
Field "ZipCode" is a plain text
Field "Telephone" is formatted as "AFSpecial_Format(2);"
Field "describe" is a plain text
Field "c/t" is a plain text
Field "County" is a plain text
Field "School" is a plain text
Field "MailAddress" is a plain text
Field "DOB" is a plain text
Field "DLNumber" is a plain text
Field "DLState" is a plain text
Field "SSNumber" is a plain text
Field "PrevAddress" is a plain text
Field "PrevC/T" is a plain text
Field "PrevCounty" is a plain text
Field "PrevState" is a plain text
Field "PrevZipCode" is a plain text
Field "PrevName" is a plain text

定义字段格式的函数描述为here，AFSpecial_Format(2)确实是电话号码格式。奇怪的是，看起来所有其他形式的字段都被形式的创建者认为是不重要的： - ）。

如何使用pdftk获取pdf中文本字段的格式类别（数据类型）？

1 个答案: