如何使用iText查找PDF中的所有矩形

时间:2018-07-25 01:21:37

标签: pdf itext rectangles

一个带有文本框(矩形)的MS Word文档,我已经成功使用libreoffice将其转换为PDF。 如何找到pdf中的所有文本框(矩形)以及如何解释矩形的坐标?

@Override
public void modifyPath(PathConstructionRenderInfo renderInfo) {
    if (renderInfo.getOperation() == PathConstructionRenderInfo.RECT) {
        float x = renderInfo.getSegmentData().get(0);
        float y = renderInfo.getSegmentData().get(1);
        float w = renderInfo.getSegmentData().get(2);
        float h = renderInfo.getSegmentData().get(3);
        Vector a = new Vector(x, y, 1).cross(renderInfo.getCtm());
        Vector c = new Vector(x + w, y + h, 1).cross(renderInfo.getCtm());

实现ExtRenderListener,仅允许找到页面(A4)矩形,而不找到包含页面中所有内容的(textbox)矩形。

1 个答案:

答案 0 :(得分:2)

正如Bruno所指出的那样,问题在于您可能会遇到仅由line-to或move-to操作定义的矩形。

您将需要跟踪所有画线操作,并在它们相交后立即“汇总”(每当画一条线时,其终点/起点与已知线的终点/起点匹配)。 / p>

public class RectangleFinder implements IEventListener {

    private Map<Line, Integer> knownLines = new HashMap<>();
    private Map<Integer, Integer> clusters = new HashMap<>();

    public void eventOccurred(IEventData data, EventType type) {
        if(data instanceof PathRenderInfo){
            PathRenderInfo pathRenderInfo = (PathRenderInfo) data;
            pathRenderInfo.preserveGraphicsState();
            Path path = pathRenderInfo.getPath();
            if(pathRenderInfo.getOperation() == PathRenderInfo.NO_OP)
                return;
            if(pathRenderInfo.getOperation() != PathRenderInfo.FILL)
                return;
            if(!isBlack(pathRenderInfo.getFillColor()))
                return;
            for(Subpath sPath : path.getSubpaths()){
                for(IShape segment : sPath.getSegments()) {
                    if(segment instanceof Line) {
                        lineOccurred((Line) segment);
                    }
                }
            }
        }
    }

    private boolean isBlack(Color c){
        if(c instanceof IccBased){
            IccBased col01 = (IccBased) c;
            return col01.getNumberOfComponents() == 1 && col01.getColorValue()[0] == 0.0f;
        }
        if(c instanceof DeviceGray){
            DeviceGray col02 = (DeviceGray) c;
            return col02.getNumberOfComponents() == 1 && col02.getColorValue()[0] == 0.0f;
        }
        return false;
    }

    private void lineOccurred(Line line){
        int ID = 0;
        if(!knownLines.containsKey(line)) {
            ID = knownLines.size();
            knownLines.put(line, ID);
        }else{
            ID = knownLines.get(line);
        }

        Point start = line.getBasePoints().get(0);
        Point end = line.getBasePoints().get(1);
        for(Line line2 : knownLines.keySet()){
            if(line.equals(line2))
                continue;
            if(line2.getBasePoints().get(0).equals(start)
                    || line2.getBasePoints().get(1).equals(end)
                    || line2.getBasePoints().get(0).equals(end)
                    || line2.getBasePoints().get(1).equals(start)){
                int ID2 = find(knownLines.get(line2));
                clusters.put(ID, ID2);
                break;
            }
        }
    }

    private int find(int ID){
        int out = ID;
        while(clusters.containsKey(out))
            out = clusters.get(out);
        return out;
    }

    public Set<EventType> getSupportedEvents() {
        return null;
    }

    public Collection<Set<Line>> getClusters(){
        Map<Integer, Set<Line>> out = new HashMap<>();
        for(Integer val : clusters.values())
            out.put(val, new HashSet<Line>());
        out.put(-1, new HashSet<Line>());
        for(Line l : knownLines.keySet()){
            int clusterID = clusters.containsKey(knownLines.get(l)) ? clusters.get(knownLines.get(l)) : -1;
            out.get(clusterID).add(l);
        }
        out.remove(-1);
        return out.values();
    }

    public Collection<Rectangle> getBoundingBoxes(){
        Set<Rectangle> rectangles = new HashSet<>();
        for(Set<Line> cluster : getClusters()){
            double minX = Double.MAX_VALUE;
            double minY = Double.MAX_VALUE;
            double maxX = -Double.MAX_VALUE;
            double maxY = -Double.MAX_VALUE;
            for(Line l : cluster){
                for(Point p : l.getBasePoints()){
                    minX = Math.min(minX, p.x);
                    minY = Math.min(minY, p.y);
                    maxX = Math.max(maxX, p.x);
                    maxY = Math.max(maxY, p.y);
                }
            }
            double w = (maxX - minX);
            double h = (maxY - minY);
            rectangles.add(new Rectangle((float) minX, (float) minY, (float) w, (float) h));
        }
        return rectangles;
    }
}

这是我编写的用于在页面上找到黑色(填充)矩形的类。 稍作调整,它也可以找到其他矩形。