当pdfbox中的位置已知时,使用pdfbox突出显示文本

时间:2013-08-27 10:58:34

标签: java apache pdf pdfbox

当我有它的坐标时,pdfbox是否提供了一些用于突出显示文本的实用程序?

文本的界限是已知的。

我知道还有其他库提供与pdfclown等相同的功能。但pdfbox是否提供类似的功能?

4 个答案:

答案 0 :(得分:6)

好吧,我发现了这一点。这很简单。

PDDocument doc = PDDocument.load(/*path to the file*/);
PDPage page = (PDPage)doc.getDocumentCatalog.getAllPages.get(i);
List annots = page.getAnnotations;
PDAnnotationTextMarkup markup = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.Su....);
markup.setRectangle(/*your PDRectangle*/);
markup.setQuads(/*float array of size eight with all the vertices of the PDRectangle in anticlockwise order*/);
annots.add(markup);
doc.save(/*path to the output file*/);

答案 1 :(得分:2)

这是数字1的扩展答案,基本上与上面的代码相同。

相对于当前文档中的页面大小,改进了坐标点,同时还改进了非常浅的黄色,有时如果单词较短且较小,则很难看到。

还要突出显示整个单词,并从左上角到右上角使用X,Y坐标。从字符串的第一个字符和最后一个字符获取坐标。

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.List;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.graphics.color.PDColor;
import org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;

public class MainSource extends PDFTextStripper {

    public MainSource()  throws IOException {
        super();
    }

    public static void main(String[] args)  throws IOException {
        PDDocument document = null;
        String fileName = "C:/AnyPDFFile.pdf";
        try {
            document = PDDocument.load( new File(fileName) );
            PDFTextStripper stripper = new MainSource();
            stripper.setSortByPosition( true );

            stripper.setStartPage( 0 );
            stripper.setEndPage( document.getNumberOfPages() );

            Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
            stripper.writeText(document, dummy);

            File file1 = new File("C:/AnyPDFFile-New.pdf");
            document.save(file1);
        }
        finally {
            if( document != null ) {
                document.close();
            }
        }
    }

    /**
     * Override the default functionality of PDFTextStripper.writeString()
     */

    @Override
    protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
        boolean isFound = false;
        float posXInit  = 0, 
              posXEnd   = 0, 
              posYInit  = 0,
              posYEnd   = 0,
              width     = 0, 
              height    = 0, 
              fontHeight = 0;
        String[] criteria = {"Word1", "Word2", "Word3", ....};

        for (int i = 0; i < criteria.length; i++) {
            if (string.contains(criteria[i])) {
                isFound = true;
            } 
        }
        if (isFound) {
            posXInit = textPositions.get(0).getXDirAdj();
            posXEnd  = textPositions.get(textPositions.size() - 1).getXDirAdj() + textPositions.get(textPositions.size() - 1).getWidth();
            posYInit = textPositions.get(0).getPageHeight() - textPositions.get(0).getYDirAdj();
            posYEnd  = textPositions.get(0).getPageHeight() - textPositions.get(textPositions.size() - 1).getYDirAdj();
            width    = textPositions.get(0).getWidthDirAdj();
            height   = textPositions.get(0).getHeightDir();

            System.out.println(string + "X-Init = " + posXInit + "; Y-Init = " + posYInit + "; X-End = " + posXEnd + "; Y-End = " + posYEnd + "; Font-Height = " + fontHeight);

            /* numeration is index-based. Starts from 0 */

            float quadPoints[] = {posXInit, posYEnd + height + 2, posXEnd, posYEnd + height + 2, posXInit, posYInit - 2, posXEnd, posYEnd - 2};

            List<PDAnnotation> annotations = document.getPage(this.getCurrentPageNo() - 1).getAnnotations();
            PDAnnotationTextMarkup highlight = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);

            PDRectangle position = new PDRectangle();
            position.setLowerLeftX(posXInit);
            position.setLowerLeftY(posYEnd);
            position.setUpperRightX(posXEnd);
            position.setUpperRightY(posYEnd + height);

            highlight.setRectangle(position);

            // quadPoints is array of x,y coordinates in Z-like order (top-left, top-right, bottom-left,bottom-right) 
            // of the area to be highlighted

            highlight.setQuadPoints(quadPoints);

            PDColor yellow = new PDColor(new float[]{1, 1, 1 / 255F}, PDDeviceRGB.INSTANCE);
            highlight.setColor(yellow);
            annotations.add(highlight);
        }
    }

}

答案 2 :(得分:1)

这适用于pdfbox 2.0.7

PDDocument document = /* get doc */
/* numeration is index-based. Starts from 0 */
List<PDAnnotation> annotations = document.getPage(yourPageNumber - 1).getAnnotations();
PDAnnotationTextMarkup highlight = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
highlight.setRectangle(PDRectangle.A4);
// quadPoints is array of x,y coordinates in Z-like order (top-left, top-right, bottom-left,bottom-right) 
// of the area to be highlighted
highlight.setQuadPoints(quadPoints);
PDColor yellow = new PDColor(new float[]{1, 1, 204 / 255F}, PDDeviceRGB.INSTANCE);
highlight.setColor(yellow);
annotations.add(highlight);

注意:如果将doc保存在文件中,则会显示此注释,但由于没有为此注释创建AppearanceStream,因此它不会出现在从页面创建的图像中。我用PDFBOX-3353

的代码草稿解决了这个问题

答案 3 :(得分:0)

最简单的方法...在所需位置绘制一个矩形,然后将高度设置为1,将填充颜色设置为BLACK。 或...

使用PDFBox ... //创建页面PDDocument doc = new PDDocument(); PDPage page1 =新的PDPage(); doc.addPage(page1); //创建流PDPageContentStream stream1 = new PDPageContentStream(doc,page1); //使用坐标简单地绘制下划线//其中第一个是x开始,第二个y开始,第三个x结束,第四个y结束stream1.drawLine(20,740,590,740); //绘制比一个像素厚的下划​​线//第一x开始第二y开始第三长度第四厚度stream1.addRect(345,568,70,2); stream1.setNonStrokingColor(Color.BLACK); stream1.fill(); –