Question

我第一次使用pdfbox。现在我正在网站Pdf

上阅读一些内容

总结我有这样的pdf：

enter image description here

只是我的文件有许多不同的组件（textField，RadionButton，CheckBox）。对于这个pdf，我必须阅读这些值：Mauro，Rossi，MyCompany。现在我写了下面的代码：

PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();

for(PDField pdField : pdAcroForm.getFields()){
    System.out.println(pdField.getValue())
}

这是读取表单组件内部值的正确方法吗？对此有何建议？我在哪里可以学习pdfbox上的其他内容？

Answer 1

您拥有的代码应该有效。如果您实际上想要对值进行某些操作，则可能需要使用其他一些方法。例如，您可以使用pdAcroForm.getField(<fieldName>)获取特定字段：

PDField firstNameField = pdAcroForm.getField("firstName");
PDField lastNameField = pdAcroForm.getField("lastName");

请注意PDField只是一个基类。您可以将事物转换为子类以从中获取更多有趣的信息。例如：

PDCheckbox fullTimeSalary = (PDCheckbox) pdAcroForm.getField("fullTimeSalary");
if(fullTimeSalary.isChecked()) {
    log.debug("The person earns a full-time salary");
} else {
    log.debug("The person does not earn a full-time salary");
}

如您所知，您可以在apache pdfbox网站上找到更多信息。

Answer 2

该字段可以是顶级字段。因此，您需要循环直到它不再是顶级字段，然后您才能获得该值。下面的代码片段遍历所有字段并输出字段名称和值。

{
    //from your original code
    PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
    PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
    PDAcroForm pdAcroForm = pdCatalog.getAcroForm();


    //get all fields in form
    List<PDField> fields = acroForm.getFields();
    System.out.println(fields.size() + " top-level fields were found on the form");

    //inspect field values
    for (PDField field : fields)
    {
            processField(field, "|--", field.getPartialName());
    }

    ...
}


private void processField(PDField field, String sLevel, String sParent) throws IOException
{
        String partialName = field.getPartialName();

        if (field instanceof PDNonTerminalField)
        {
                if (!sParent.equals(field.getPartialName()))
                {
                        if (partialName != null)
                        {
                                sParent = sParent + "." + partialName;
                        }
                }
                System.out.println(sLevel + sParent);

                for (PDField child : ((PDNonTerminalField)field).getChildren())
                {
                        processField(child, "|  " + sLevel, sParent);
                }
        }
        else
        {
            //field has no child. output the value
                String fieldValue = field.getValueAsString();
                StringBuilder outputString = new StringBuilder(sLevel);
                outputString.append(sParent);
                if (partialName != null)
                {
                        outputString.append(".").append(partialName);
                }
                outputString.append(" = ").append(fieldValue);
                outputString.append(",  type=").append(field.getClass().getName());
                System.out.println(outputString);
        }
}

使用pdfbox获取表单字段值

2 个答案: