我正在尝试打印后处理的(填充的)PDF模板,该模板在LibreOffice中创建并包含填写的表单字段。
PDFBox svn很不错,并且有很多示例。轻松获取PDF及其AcroFormat,甚至可以按预期方式编辑和保存修改后的PDF到磁盘。但这不是我的目标。我想要一个填充了字段然后仅保留文本的PDF。
我尝试了有关PDFBox的stackoverflow的所有工作,从拼合acroform到在字段和其他元信息上设置只读属性,安装了必要的字体等等。每次我将PDF打印到文件时,文本字段中的文本(已编辑和未编辑)都会消失,并且文本字段也消失了。
但是后来我尝试create a PDF from scratch使用PDFBox,并且打印效果像预期的那样。文本字段位于生成的模板中,打印的pdf文件包含我想要的文本,并删除了相应的表格。 因此,我使用了PDFBox中的PDF Debugger来分析PDF的结构,并注意到在调试器的预览中,我的PDF在文本字段中不包含从LibreOffice导出的文本。但是在树形结构中,PDF批注显然位于其中(/ DV和/ V),并且看上去与pdfbox创建的版本相似,可以正常工作。
为了进行测试,我创建了一个简单的pdf文件,其中只有一个文本字段,名称为“ test”,内容为“ Foobar”。同时更改了背景和边框颜色,以查看是否成功打印出任何内容。
PDDocument document = null;
try {
document = PDDocument.load(new File("<filepath>\\<filename>"));
} catch (final InvalidPasswordException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (final IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
PrintFields.createDummyPDF("<filepath>\\<filename>");
PrintFields.printFields(document); //debug output
//Getting pdf meta infos
final PDDocumentCatalog docCatalog = document.getDocumentCatalog();
final PDAcroForm acroForm = docCatalog.getAcroForm();
docCatalog.setAcroForm(acroForm);
//setting the appearance
final PDFont font = PDType1Font.HELVETICA;
final PDResources resources = new PDResources();
resources.put(COSName.getPDFName("Helv"), font);
acroForm.setDefaultResources(resources);
String defaultAppearanceString = "/Helv 0 Tf 0 g";
acroForm.setDefaultAppearance(defaultAppearanceString);
for(final PDField f : acroForm.getFields()) {
if(f instanceof PDTextField) {
defaultAppearanceString = "/Helv 12 Tf 0 0 1 rg";
final List<PDAnnotationWidget> widgets = ((PDTextField)f).getWidgets();
widgets.get(0).setAppearanceState(defaultAppearanceString);
}
}
for(final PDField f : acroForm.getFields()) {
f.setReadOnly(true);
}
// save modified pdf to file
document.save("<filepath>\\<filename>");
//print to file (to pdf)
if (job.printDialog()) {
try {
// Desktop.getDesktop().print();
job.print();
} catch (final PrinterException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
// copied from pdfbox examples
public static void createDummyPDF(final String path) throws IOException
{
// Create a new document with an empty page.
try (PDDocument document = new PDDocument())
{
final PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
// Adobe Acrobat uses Helvetica as a default font and
// stores that under the name '/Helv' in the resources dictionary
final PDFont font = PDType1Font.HELVETICA;
final PDResources resources = new PDResources();
resources.put(COSName.getPDFName("Helv"), font);
// Add a new AcroForm and add that to the document
final PDAcroForm acroForm = new PDAcroForm(document);
document.getDocumentCatalog().setAcroForm(acroForm);
// Add and set the resources and default appearance at the form level
acroForm.setDefaultResources(resources);
// Acrobat sets the font size on the form level to be
// auto sized as default. This is done by setting the font size to '0'
String defaultAppearanceString = "/Helv 0 Tf 0 g";
acroForm.setDefaultAppearance(defaultAppearanceString);
// Add a form field to the form.
final PDTextField textBox = new PDTextField(acroForm);
textBox.setPartialName("SampleField");
// Acrobat sets the font size to 12 as default
// This is done by setting the font size to '12' on the
// field level.
// The text color is set to blue in this example.
// To use black, replace "0 0 1 rg" with "0 0 0 rg" or "0 g".
defaultAppearanceString = "/Helv 12 Tf 0 0 1 rg";
textBox.setDefaultAppearance(defaultAppearanceString);
// add the field to the acroform
acroForm.getFields().add(textBox);
// Specify the widget annotation associated with the field
final PDAnnotationWidget widget = textBox.getWidgets().get(0);
final PDRectangle rect = new PDRectangle(50, 750, 200, 50);
widget.setRectangle(rect);
widget.setPage(page);
// set green border and yellow background
// if you prefer defaults, just delete this code block
final PDAppearanceCharacteristicsDictionary fieldAppearance
= new PDAppearanceCharacteristicsDictionary(new COSDictionary());
fieldAppearance.setBorderColour(new PDColor(new float[]{0,1,0}, PDDeviceRGB.INSTANCE));
fieldAppearance.setBackground(new PDColor(new float[]{1,1,0}, PDDeviceRGB.INSTANCE));
widget.setAppearanceCharacteristics(fieldAppearance);
// make sure the widget annotation is visible on screen and paper
widget.setPrinted(true);
// Add the widget annotation to the page
page.getAnnotations().add(widget);
// set the field value
textBox.setValue("Sample field");
document.save(path);
}
}
//copied from pdfbox examples
public static void processFields(final List<PDField> fields, final PDResources resources) {
fields.stream().forEach(f -> {
f.setReadOnly(true);
final COSDictionary cosObject = f.getCOSObject();
final String value = cosObject.getString(COSName.DV) == null ?
cosObject.getString(COSName.V) : cosObject.getString(COSName.DV);
System.out.println("Setting " + f.getFullyQualifiedName() + ": " + value);
try {
f.setValue(value);
} catch (final IOException e) {
if (e.getMessage().matches("Could not find font: /.*")) {
final String fontName = e.getMessage().replaceAll("^[^/]*/", "");
System.out.println("Adding fallback font for: " + fontName);
resources.put(COSName.getPDFName(fontName), PDType1Font.HELVETICA);
try {
f.setValue(value);
} catch (final IOException e1) {
e1.printStackTrace();
}
} else {
e.printStackTrace();
}
}
if (f instanceof PDNonTerminalField) {
processFields(((PDNonTerminalField) f).getChildren(), resources);
}
});
我希望document.save()和job.print()生成的pdf在Viewer中看起来完全一样,但事实并非如此。 如果我使用document.save()生成的pdf禁用了只读功能,则可以使用FoxitReader之类的PDF Viewer填写表格并再次打印。这样会产生正确的输出。使用job.print()版本会导致(文本)表单字段中包含的文本消失。 有谁知道为什么会这样吗?
我正在使用PDFBox 2.0.13(最新版本)和LibreOffice 6.1.4.2。 Here是被引用的文件,here您可以下载调试器(jar文件,可通过java -jar运行)。