Question

Apache PDFBox中是否存在PDF版本1.3的已知问题？如果我想从带有标题1.3的pdf文档中提取文本，我会得到一个例外：

java.util.zip.DataFormatException: incorrect header check

版本1.4和1.5的PDF文件正在运行。如果我使用外部工具从1.3版到1.4版手动转换PDF文件，它也可以。

以下是我正在使用的代码：

final PDFParser parser = new PDFParser(new FileInputStream(fileName));
parser.parse();
cosDoc = parser.getDocument();
final PDFTextStripper pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
pdfStripper.setAddMoreFormatting(true);
text = pdfStripper.getText(pdDoc).trim();

我正在使用Apache PDFBox 1.8.10

谢谢！

!!!修复!!!

看起来好像1.8.10中存在错误。我将框架更新到2.0.6版本，现在可以使用了。

Answer 1

看起来好像1.8.10中存在错误。我将框架更新到版本2.0.6，并使用相同的PDF文件，现在可以使用。

使用pdf版本1.3的pdfbox错误标题检查

1 个答案: