Question

我正在使用PDFBox来提取pdf文件的内容。我能够提取文本，但我还需要获取文本的字体属性。那么有人可以帮我提取字体属性吗？

我在正确提取某些字符方面也遇到了问题。 PDFBox给出'？'当它无法识别角色时。所以如果可能的话，也给我一些解决问题的建议..

提前致谢..

Answer 1

import org.apache.pdfbox.pdmodel.PDDocument;  
import org.apache.pdfbox.util.PDFTextStripper;  
public class pdf2box {  
    public static void main(String args[])
    {
        try
        {
    PDDocument pddDocument=PDDocument.load("table2.pdf");
    PDFTextStripper textStripper=new PDFTextStripper();
    System.out.println(textStripper.getText(pddDocument));
    textStripper.getFonts();



    pddDocument.close();
        }
        catch(Exception ex)
        {
        ex.printStackTrace();
        }
    }


}

使用pdfbox获取pdf文本的字体属性

1 个答案: