Question

我没有编写很长时间，并决定编写一个程序，下载当前的官方高尔夫世界排名in PDF form，然后使用JLabel显示前10名。

虽然程序能够下载文件，但我无法找到如何从包含数据的表中提取单个单元格，即将“本周”，“名称”，“国家”列提取到各个数组。 / p>

有人可以就如何做到这一点给我一些建议吗？

Answer 1

我最近不得不做类似的事情，我的代码看起来像这样（使用PDFBox）：

PDFParser pdfParser = new PDFParser(new FileInputStream("c:\\temp\\owgr49f2013.pdf"));
pdfParser.parse();
PDDocument pdDocument = pdfParser.getPDDocument();

PDFTextStripper stripper = new PDFTextStripper("UTF-8");
stripper.setSortByPosition(false);
stripper.setWordSeparator("###");
System.out.println(stripper.getText(pdDocument));

您需要使用正则表达式从结果文本中提取所需的信息。

如何使用PDFBox将文本提取到JLabel

1 个答案: