如何使用Java和pdfbox从pdf获取垂直字符串的坐标?

时间:2018-12-26 05:30:01

标签: java pdfbox

我正在将pdfbox与java一起使用,以查找pdf文本的位置或坐标0f。文本在水平方向上的位置是完美的,但垂直文本的位置不正确。这是代码

protected void writeString(String string, List<TextPosition> textPositions) throws IOException {

    String wordSeparator = getWordSeparator();
    List<TextPosition> word = new ArrayList<>();

    for (TextPosition text : textPositions) {
        String thisChar = text.getUnicode();
        pageWid = text.getPageWidth();
        pageHe = text.getPageHeight();
        if (thisChar != null) {
            if (thisChar.length() >= 1) {
                if (!thisChar.equals(wordSeparator)) {
                    word.add(text);
                } else if (!word.isEmpty()) {
                    printWord(word);

                    word.clear();
                }

            }
        }

    }
    if (!word.isEmpty()) {

        printWord(word);
        word.clear();
    }

}

void printWord(List<TextPosition> word) {
    Rectangle2D boundingBox = null;
    StringBuilder builder = new StringBuilder();

    for (TextPosition text : word) {

    /*  int rot =text.getRotation();
      System.out.println("rotation is "+rot);*/
        Rectangle2D box = new Rectangle2D.Float(text.getX(), text.getY(), text.getWidthDirAdj(),
                text.getHeightDir());

        if (boundingBox == null)
            boundingBox = box;
        else
            boundingBox.add(box);
        builder.append(text.getUnicode());
    }

    String words = builder.toString().toLowerCase();
    System.out.println("the word is" + words + "length is" + words.length());

    if (words.length() != 1) {

        returncordinates(words, boundingBox.getX(), boundingBox.getY(), boundingBox.getHeight(),
                boundingBox.getWidth());
    }

}

使用此代码,我可以找到水平文本的坐标:

输出:思考{'x':80.97092447794397,'y':36.56483345224815,'height':1.3265644959675444,'page_no':3,'width':6.51770994998429}

,但是当文本为水平时,垂直文本的坐标将不准确(附加图像)。 enter image description here

0 个答案:

没有答案