如何使用PDFBox“展平”PDF格式(删除表单字段但保留字段文本)?
Same question was answered here:
快速执行此操作的方法是从acrofrom中删除字段。
为此你只需要获取文档目录,然后获取acroform 然后从此acroform中删除所有字段。
图形表示与注释链接并保持不变 文件。
所以我写了这段代码:
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
public class PdfBoxTest {
public void test() throws Exception {
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
if (acroForm == null) {
System.out.println("No form-field --> stop");
return;
}
@SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
// set the text in the form-field <-- does work
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setValue("Test-String");
}
}
// remove form-field but keep text ???
// acroForm.getFields().clear(); <-- does not work
// acroForm.setFields(null); <-- does not work
// acroForm.setFields(new ArrayList()); <-- does not work
// ???
pdDoc.save("E:\\Form-Test-Result.pdf");
pdDoc.close();
}
}
答案 0 :(得分:14)
使用PDFBox 2,现在可以通过调用flatten
对象上的PDAcroForm
方法轻松地“展平”PDF表单。见Javadoc:PDAcroForm.flatten()。
使用此方法调用示例的简化代码:
//Load the document
PDDocument pDDocument = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
//Fill the document
...
//Flatten the document
pDAcroForm.flatten();
//Save the document
pDDocument.save("E:\\Form-Test-Result.pdf");
pDDocument.close();
注意:动态XFA表单无法展平。
要从PDFBox 1. *迁移到2.0,请查看the official migration guide。
答案 1 :(得分:7)
setReadOnly对我有用,如下所示 -
@SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setReadOnly(true);
}
}
答案 2 :(得分:7)
这肯定是有效的 - 我遇到了这个问题,整夜调试,但终于弄明白了怎么做:)
此 假设您有能力以某种方式编辑PDF /可以控制PDF。
首先,使用Acrobat Pro编辑表单。将它们隐藏为只读。
然后你需要使用两个库:PDFBox和PDFClown。
PDFBox删除告诉Adobe Reader它是一个表单的东西; PDFClown删除实际字段。 PDFClown必须首先完成,然后是PDFBox(按顺序完成。反过来说不起作用)。
单字段示例代码:
// PDF Clown code
File file = new File("Some file path");
Document document = file.getDocument();
Form form = file.getDocument.getForm();
Fields fields = form.getFields();
Field field = fields.get("some_field_name");
PageStamper stamper = new PageStamper();
FieldWidgets widgets = field.getWidgets();
Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out
stamper.setPage(widget.getPage());
// Write text using text form field position as pivot.
PrimitiveComposer composer = stamper.getForeground();
Font font = font.get(document, "some_path");
composer.setFont(font, 10);
double xCoordinate = widget.getBox().getX();
double yCoordinate = widget.getBox().getY();
composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate));
// Actually delete the form field!
field.delete();
stamper.flush();
// Create new buffer to output to...
Buffer buffer = new Buffer();
file.save(buffer, SerializationModeEnum.Standard);
byte[] bytes = buffer.toByteArray();
// PDFBox code
InputStream pdfInput = new ByteArrayInputStream(bytes);
PDDocument pdfDocument = PDDocument.load(pdfInput);
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
// Phew. Finally.
pdfDocument.save("Some file path");
可能在这里和那里有一些错别字,但这应该足以得到要点:)
答案 3 :(得分:4)
在阅读了pdf参考指南后,我发现通过添加值为1的“Ff”键(字段标志),您可以非常轻松地为AcroForm字段设置只读模式。 这就是文档所代表的内容:
如果设置,用户可能不会更改字段的值。 任何关联的窗口小部件注释都不会交互 与用户;也就是说,他们不会回应鼠标 点击或更改其外观以响应 鼠标动作。此标志对于其字段很有用 值是从数据库计算或导入的。
所以代码看起来像那样(使用pdfbox lib):
public static void makeAllWidgetsReadOnly(PDDocument pdDoc) throws IOException {
PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> acroFormFields = form.getFields();
System.out.println(String.format("found %d acroFrom fields", acroFormFields.size()));
for(PDField field: acroFormFields) {
makeAcroFieldReadOnly(field);
}
}
private static void makeAcroFieldReadOnly(PDField field) {
field.getDictionary().setInt("Ff",1);
}
答案 4 :(得分:3)
使用pdfBox展平acroform并保留表单字段值的解决方案:
使用pdfbox 2.0.1为我工作的解决方案:
File myFile = new File("myFile.pdf");
PDDocument pdDoc = PDDocument.load(myFile);
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
// set the NeedAppearances flag to false
pdAcroForm.setNeedAppearances(false);
field.setValue("new-value");
pdAcroForm.flatten();
pdDoc.save("myFlattenedFile.pdf");
pdDoc.close();
我不需要在上面的解决方案链接中执行以下2个额外步骤:
// correct the missing page link for the annotations
// Add the missing resources to the form
我在OpenOffice 4.1.1中创建了我的pdf表单并导出为pdf。在OpenOffice导出对话框中选择的2个项目是:
使用PdfBox我填充了表单字段并创建了一个展平的pdf文件,该文件删除了表单字段,但保留了表单字段值。
答案 5 :(得分:2)
为了真正&#34;展平&#34;一个杂技演员表格领域似乎还有很多事要做,而不是乍一看。 在检查了PDF standard之后,我设法通过三个步骤实现真正的提升:
所有这三个步骤都可以使用pdfbox完成(我使用的是1.8.5)。下面我将描绘我是如何做到的。 一个非常有用的工具,以了解最新情况是PDF Debugger。
这是三者中最复杂的一步。
为了保存字段的值,您必须将其内容保存到字段小部件的每个的pdf内容。最简单的方法是将每个小部件的外观绘制到小部件的页面。
void saveFieldValue( PDField field ) throws IOException
{
PDDocument document = getDocument( field );
// see PDField.getWidget()
for( PDAnnotationWidget widget : getWidgets( field ) )
{
PDPage parentPage = getPage( widget );
try (PDPageContentStream contentStream = new PDPageContentStream( document, parentPage, true, true ))
{
writeContent( contentStream, widget );
}
}
}
void writeContent( PDPageContentStream contentStream, PDAnnotationWidget widget )
throws IOException
{
PDAppearanceStream appearanceStream = getAppearanceStream( widget );
PDXObject xobject = new PDXObjectForm( appearanceStream.getStream() );
AffineTransform transformation = getPositioningTransformation( widget.getRectangle() );
contentStream.drawXObject( xobject, transformation );
}
外观是包含所有窗口小部件内容(值,字体,大小,旋转等)的XObject流。您只需将其放在页面上的正确位置即可从小部件的矩形中提取。
如上所述,每个字段可能有多个小部件。小部件负责如何编辑表单字段,触发,在不编辑时显示这些内容。
要删除一个,您必须将其从页面注释中删除。
void removeWidget( PDAnnotationWidget widget ) throws IOException
{
PDPage widgetPage = getPage( widget );
List<PDAnnotation> annotations = widgetPage.getAnnotations();
PDAnnotation deleteCandidate = getMatchingCOSObjectable( annotations, widget );
if( deleteCandidate != null && annotations.remove( deleteCandidate ) )
widgetPage.setAnnotations( annotations );
}
请注意,注释可能不包含确切的PDAnnotationWidget,因为它是一种包装器。您必须删除具有匹配COSObject的那个。
作为最后一步,您将删除表单字段本身。这与上面的其他帖子没什么不同。
void removeFormfield( PDField field ) throws IOException
{
PDAcroForm acroForm = field.getAcroForm();
List<PDField> acroFields = acroForm.getFields();
List<PDField> removeCandidates = getFields( acroFields, field.getPartialName() );
if( removeAll( acroFields, removeCandidates ) )
acroForm.setFields( acroFields );
}
请注意,我在这里使用了自定义removeAll方法,因为removeCandidates.removeAll()没有按预期工作。
很抱歉,我无法在此提供所有代码,但上述内容您应该可以自己编写。
答案 6 :(得分:1)
这是我在综合了我能找到的关于这个主题的所有答案后得出的代码。这会处理展平文本框,组合,列表,复选框和无线电:
public static void flattenPDF (PDDocument doc) throws IOException {
//
// find the fields and their kids (widgets) on the input document
// (each child widget represents an appearance of the field data on the page, there may be multiple appearances)
//
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> tmpfields = form.getFields();
PDResources formresources = form.getDefaultResources();
Map formfonts = formresources.getFonts();
PDAnnotation ann;
//
// for each input document page convert the field annotations on the page into
// content stream
//
List<PDPage> pages = catalog.getAllPages();
Iterator<PDPage> pageiterator = pages.iterator();
while (pageiterator.hasNext()) {
//
// get next page from input document
//
PDPage page = pageiterator.next();
//
// add the fonts from the input form to this pages resources
// so the field values will display in the proper font
//
PDResources pageResources = page.getResources();
Map pageFonts = pageResources.getFonts();
pageFonts.putAll(formfonts);
pageResources.setFonts(pageFonts);
//
// Create a content stream for the page for appending
//
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
//
// Find the appearance widgets for all fields on the input page and insert them into content stream of the page
//
for (PDField tmpfield : tmpfields) {
List widgets = tmpfield.getKids();
if(widgets == null) {
widgets = new ArrayList();
widgets.add(tmpfield.getWidget());
}
Iterator<COSObjectable> widgetiterator = widgets.iterator();
while (widgetiterator.hasNext()) {
COSObjectable next = widgetiterator.next();
if (next instanceof PDField) {
PDField foundfield = (PDField) next;
ann = foundfield.getWidget();
} else {
ann = (PDAnnotation) next;
}
if (ann.getPage().equals(page)) {
COSDictionary dict = ann.getDictionary();
if (dict != null) {
if(tmpfield instanceof PDVariableText || tmpfield instanceof PDPushButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if (ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSStream stream = (COSStream) ap.getDictionaryObject("N");
if (stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
contentStream.appendRawCommands("Q\n");
}
} else if (tmpfield instanceof PDChoiceButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if(ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSName cbValue = (COSName) dict.getDictionaryObject(COSName.AS);
COSDictionary d = (COSDictionary) ap.getDictionaryObject(COSName.D);
if (d != null) {
COSStream stream = (COSStream) d.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
if (!(tmpfield instanceof PDCheckbox)){
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
}
COSDictionary n = (COSDictionary) ap.getDictionaryObject(COSName.N);
if (n != null) {
COSStream stream = (COSStream) n.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
contentStream.appendRawCommands("Q\n");
}
}
}
}
}
}
// delete any field widget annotations and write it all to the page
// leave other annotations on the page
COSArrayList newanns = new COSArrayList();
List anns = page.getAnnotations();
ListIterator annotiterator = anns.listIterator();
while (annotiterator.hasNext()) {
COSObjectable next = (COSObjectable) annotiterator.next();
if (!(next instanceof PDAnnotationWidget)) {
newanns.add(next);
}
}
page.setAnnotations(newanns);
contentStream.close();
}
//
// Delete all fields from the form and their widgets (kids)
//
for (PDField tmpfield : tmpfields) {
List kids = tmpfield.getKids();
if(kids != null) kids.clear();
}
tmpfields.clear();
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = doc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
}
答案 7 :(得分:1)
我没有足够的观点来发表评论,但SJohnson对于将该字段设置为只读的回应对我来说非常合适。我正在使用PDFBox这样的东西:
private void setFieldValueAndFlatten(PDAcroForm form, String fieldName, String fieldValue) throws IOException {
PDField field = form.getField(fieldName);
if(field != null){
field.setValue(fieldValue);
field.setReadonly(true);
}
}
这将写入您的字段值,然后在保存后打开PDF时,它将具有您的值,而不是可编辑的。
答案 8 :(得分:0)
这是Thomas的答案,来自PDFBox-Mailinglist:
您需要在COSDictionary上获取Fields。试试这个 代码...
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray fields = acroFormDict.getDictionaryObject("Fields");
fields.clear();
答案 9 :(得分:0)
我想我会分享与PDFBox 2+一起使用的方法。
我们使用了PDAcroForm.flatten()
方法。
这些字段需要进行一些预处理,最重要的是必须遍历嵌套的字段结构,并检查DV和V的值。
最后,有效的方法如下:
private static void flattenPDF(String src, String dst) throws IOException {
PDDocument doc = PDDocument.load(new File(src));
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
PDResources resources = new PDResources();
acroForm.setDefaultResources(resources);
List<PDField> fields = new ArrayList<>(acroForm.getFields());
processFields(fields, resources);
acroForm.flatten();
doc.save(dst);
doc.close();
}
private static void processFields(List<PDField> fields, PDResources resources) {
fields.stream().forEach(f -> {
f.setReadOnly(true);
COSDictionary cosObject = f.getCOSObject();
String value = cosObject.getString(COSName.DV) == null ?
cosObject.getString(COSName.V) : cosObject.getString(COSName.DV);
System.out.println("Setting " + f.getFullyQualifiedName() + ": " + value);
try {
f.setValue(value);
} catch (IOException e) {
if (e.getMessage().matches("Could not find font: /.*")) {
String fontName = e.getMessage().replaceAll("^[^/]*/", "");
System.out.println("Adding fallback font for: " + fontName);
resources.put(COSName.getPDFName(fontName), PDType1Font.HELVETICA);
try {
f.setValue(value);
} catch (IOException e1) {
e1.printStackTrace();
}
} else {
e.printStackTrace();
}
}
if (f instanceof PDNonTerminalField) {
processFields(((PDNonTerminalField) f).getChildren(), resources);
}
});
}
答案 10 :(得分:0)
如果PDF文档实际上不包含表单字段,但是您仍然希望展平标记之类的其他元素,则以下内容会很好地工作。仅供参考,它是为C#实现的
public static void FlattenPdf(string fileName)
{
PDDocument doc = PDDocument.load(new java.io.File(fileName));
java.util.List annots = doc.getPage(0).getAnnotations();
for (int i = 0; i < annots.size(); ++i)
{
PDAnnotation annot = (PDAnnotation)annots.get(i);
annot.setLocked(true);
annot.setReadOnly(true);
annot.setNoRotate(true);
}
doc.save(fileName);
doc.close();
}
这有效地锁定了文档中的所有标记,它们将不再可编辑。