尝试运行pdfbox programm时出错

时间:2013-09-27 22:05:52

标签: java linux pdf pdfbox

我尝试从此页面运行Pdfbox示例:http://www.printmyfolders.com/Home/PDFBox-Tutorial 从PDF文件中提取文本。当我尝试运行它时,我有错误:

org.apache.pdfbox.exceptions.WrappedIOException
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:245)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1192)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1159)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1130)
   at GetPos.main(GetPos.java:14)
Caused by: java.lang.ArrayIndexOutOfBoundsException
   at java.lang.System.arraycopy(libgcj.so.10)
   at java.io.ByteArrayOutputStream.write(libgcj.so.10)
   at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:172)
   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98)
   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:295)
   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:237)
   at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:172)
   at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>(PDFXrefStreamParser.java:61)
   at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(PDFParser.java:848)
   at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:576)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188)
   ...4 more

这是什么意思? 第一个包含空白pdf的示例效果很好。

2 个答案:

答案 0 :(得分:0)

使用该示例生成带有文本的PDF,然后使用相关教程

读取该文本
package com.mycompany.mavenproject;

import java.io.File;
import junit.framework.Test;
import junit.framework.TestCase;
import junit.framework.TestSuite;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import org.apache.pdfbox.util.PDFTextStripper;

/**
 * Unit test for simple App.
*/
public class AppTest
    extends TestCase {

public static Test suite() {
    return new TestSuite(AppTest.class);
}

public void test() throws Exception {
    final String fileName = "PDFWithText.pdf";
    writeDocument(fileName);
    PDDocument pd = PDDocument.load(new File(fileName));
    PDFTextStripper stripper = new PDFTextStripper();
    String text = stripper.getText(pd);
    assertEquals("Hello from www.printmyfolders.com", text.trim());
}

private void writeDocument(String fileName) throws Exception {
    PDDocument doc = new PDDocument();
    PDPage page = new PDPage();

    doc.addPage(page);
    PDFont font = PDType1Font.HELVETICA_BOLD;

    PDPageContentStream content = new PDPageContentStream(doc, page);
    content.beginText();
    content.setFont(font, 12);
    content.moveTextPositionByAmount(100, 700);
    content.drawString("Hello from www.printmyfolders.com");

    content.endText();
    content.close();
    doc.save(fileName);
    doc.close();
}
}

毫无例外地工作。考虑到加载方法中的异常冒泡,请确保PDF格式不正确。

答案 1 :(得分:0)

使用临时目录:

parser.setTempDirectory(new File(directoryPath));

示例:

File in = new File("somefile.pdf");
InputStream fin = new FileInputStream(in);
PDFParser parser = new PDFParser(fin);
parser.setTempDirectory(new File(tempDirectoryPath));
parser.parse();
PDDocument document = parser.getPDDocument();