我试图确定pdf页面上的文字是垂直(从下到上)而不是水平(从左到右)流动。所以我尝试了How to find pdf is portrait or landscape using PDFBOX Library in Java的答案。我使用的是pdfbox 2.0.3,这是我的代码:
public PDFTextStripperChildClass() throws IOException {
super();
setSortByPosition(false);
setShouldSeparateByBeads(true);
setPageStart("");
setPageEnd("");
setParagraphEnd("");
setParagraphStart("");
setArticleStart("");
setArticleEnd("");
setLineSeparator("\n");
isParsingOngoing = false;
}
@Override
protected void writeString(String txt, List<TextPosition> tp) {
...
PDPage pdp = this.getCurrentPage();
PDRectangle pdr = pdp.getMediaBox();
boolean isLandscape = pdr.getWidth() > pdr.getHeight();
int rotation = pdp.getRotation();
if (isLandscape || rotation == 90 || rotation == 270) {
System.out.println("landscape or rotated");
}
...
}
虽然当我在带有旋转横向页面的文档中运行它时,没有任何内容被打印出来。我没有正确引导PDFTextStripper吗?