假设我设法将 PDTerminalField 投射为 PDPushButton 的实例。 但是看看API提供的我不知道如何提取所述按钮的标签。
由于应用程序的详细程度,不添加代码。 这是一个示例pdf。
答案 0 :(得分:2)
(感谢@Tilman在这里纠正我。)确实存在这样的属性,您可以通过getAppearanceCharacteristics().getNormalCaption()
访问它,但此属性是可选的,其内容无法保证与按钮的视觉外观一致,因为外观流可能包含不同的信息。因此,可能需要一种查询属性和读取外观流的组合策略。
PDF中按钮的外观流可以包含任意数量的图形和文本绘制指令来绘制按钮,但此流不一定易于阅读或解析。例如。如果是OP提供的示例文件,则此流如下所示:
1 0.75 0.666656 rg
0 0 72 20 re
f
q
1 1 70 18 re
W
n
0 g
BT
/HeBo 12 Tf
0 g
6.696 5.857 Td
(My ) Tj
19.992 0 Td
(Button) Tj
ET
Q
这里已经可以看到按钮文本“我的按钮”,但显然必须进行一些解析才能检索它(特别是因为文本编码不需要像在这种情况下那样从ASCII派生),必须将文本提取应用于流。
不幸的是,PDFBox中的主要文本提取工作马PdfTextStripper
类很难应用于除页面内容之外的任何其他内容。因此,我使用了文本剥离器派生自的基类,只添加了最小的文本排列功能,并将其应用于按钮外观流。
import java.io.IOException;
import org.apache.pdfbox.contentstream.PDFStreamEngine;
import org.apache.pdfbox.contentstream.operator.DrawObject;
import org.apache.pdfbox.contentstream.operator.state.Concatenate;
import org.apache.pdfbox.contentstream.operator.state.Restore;
import org.apache.pdfbox.contentstream.operator.state.Save;
import org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters;
import org.apache.pdfbox.contentstream.operator.state.SetMatrix;
import org.apache.pdfbox.contentstream.operator.text.BeginText;
import org.apache.pdfbox.contentstream.operator.text.EndText;
import org.apache.pdfbox.contentstream.operator.text.MoveText;
import org.apache.pdfbox.contentstream.operator.text.MoveTextSetLeading;
import org.apache.pdfbox.contentstream.operator.text.NextLine;
import org.apache.pdfbox.contentstream.operator.text.SetCharSpacing;
import org.apache.pdfbox.contentstream.operator.text.SetFontAndSize;
import org.apache.pdfbox.contentstream.operator.text.SetTextHorizontalScaling;
import org.apache.pdfbox.contentstream.operator.text.SetTextLeading;
import org.apache.pdfbox.contentstream.operator.text.SetTextRenderingMode;
import org.apache.pdfbox.contentstream.operator.text.SetTextRise;
import org.apache.pdfbox.contentstream.operator.text.SetWordSpacing;
import org.apache.pdfbox.contentstream.operator.text.ShowText;
import org.apache.pdfbox.contentstream.operator.text.ShowTextAdjusted;
import org.apache.pdfbox.contentstream.operator.text.ShowTextLine;
import org.apache.pdfbox.contentstream.operator.text.ShowTextLineAndSpace;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.util.Vector;
public class SimpleXObjectTextStripper extends PDFStreamEngine {
public SimpleXObjectTextStripper() {
addOperator(new BeginText());
addOperator(new Concatenate());
addOperator(new DrawObject()); // special text version
addOperator(new EndText());
addOperator(new SetGraphicsStateParameters());
addOperator(new Save());
addOperator(new Restore());
addOperator(new NextLine());
addOperator(new SetCharSpacing());
addOperator(new MoveText());
addOperator(new MoveTextSetLeading());
addOperator(new SetFontAndSize());
addOperator(new ShowText());
addOperator(new ShowTextAdjusted());
addOperator(new SetTextLeading());
addOperator(new SetMatrix());
addOperator(new SetTextRenderingMode());
addOperator(new SetTextRise());
addOperator(new SetWordSpacing());
addOperator(new SetTextHorizontalScaling());
addOperator(new ShowTextLine());
addOperator(new ShowTextLineAndSpace());
}
public String getText(PDFormXObject form) throws IOException {
stringBuilder.setLength(0);
processChildStream(form, new PDPage());
return stringBuilder.toString();
}
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement)
throws IOException {
stringBuilder.append(unicode);
}
final StringBuilder stringBuilder = new StringBuilder();
}
(我包含了import
语句,因为PDFBox在这里包含几类相似的名称。)
使用这个简单的自定义剥离器类,可以从字段外观中提取文本内容,如下所示:
public void showNormalFieldAppearanceTexts(PDDocument document) throws IOException {
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
if (acroForm != null) {
SimpleXObjectTextStripper stripper = new SimpleXObjectTextStripper();
for (PDField field : acroForm.getFieldTree()) {
if (field instanceof PDTerminalField) {
PDTerminalField terminalField = (PDTerminalField) field;
System.out.println();
System.out.println("* " + terminalField.getFullyQualifiedName());
for (PDAnnotationWidget widget : terminalField.getWidgets()) {
PDAppearanceDictionary appearance = widget.getAppearance();
if (appearance != null) {
PDAppearanceEntry normal = appearance.getNormalAppearance();
if (normal != null) {
Map<COSName, PDAppearanceStream> streams = normal.isSubDictionary() ? normal.getSubDictionary() :
Collections.singletonMap(COSName.DEFAULT, normal.getAppearanceStream());
for (Map.Entry<COSName, PDAppearanceStream> entry : streams.entrySet()) {
String text = stripper.getText(entry.getValue());
System.out.printf(" * %s: %s\n", entry.getKey().getName(), text);
}
}
}
}
}
}
}
}
(ExtractAppearanceText辅助方法)