如何使用pdfbox获取页面内容高度

时间:2015-02-04 12:26:41

标签: java pdf pdfbox

使用pdfbox可以获得页面内容的高度吗? 我想我尝试了一切,但每个(PDRectangle)都返回页面的完整高度:842。 首先我认为这是因为页面编号位于页面底部,但是当我在Illustrator中打开pdf时,整个内容都在复合元素内部,并且不会扩展到整个页面高度。因此,如果插图画家可以将其视为单独的元素并计算其高度,我想这也应该可以在pdfbox中使用。

示例页面:

enter image description here

1 个答案:

答案 0 :(得分:1)

一般

PDF规范允许PDF提供多个页面边界,参见this answer。除了它们之外,内容边界可以仅从页面内容导出,例如,从

  • 表单XObjects:

      

    表单XObject 是PDF内容流,它是任何图形对象序列(包括路径对象,文本对象和采样图像)的独立描述。表单XObject可以多次绘制 - 在多个页面上或在同一页面上的多个位置 - 并且每次都会产生相同的结果,仅受到调用时的图形状态的影响。

  • 剪切路径:

      

    图形状态应包含当前剪切路径,它限制受绘制操作符影响的页面区域。该路径的封闭子路径应定义可绘制的区域。落入该区域内的标记应适用于该页面;那些落在它外面的人不应该。

  • ...

要查找其中任何一个,必须解析页面内容,查找相应的操作,并计算生成的边界。

在OP的情况下

每个示例PDF都只明确定义了一个页面边界,即 MediaBox 。因此,所有其他PDF页面边界( CropBox BleedBox TrimBox ArtBox )默认为它。所以难怪你的尝试

  

每个(PDRectangle)返回页面的完整高度:842

它们都不包含表单XObject,但都使用剪切路径。

  • 如果是test-pdf4.pdf:

    Start at: 28.31999969482422, 813.6799926757812
    Line to: 565.9199829101562, 813.6799926757812
    Line to: 565.9199829101562, 660.2196655273438
    Line to: 28.31999969482422, 660.2196655273438
    Line to: 28.31999969482422, 813.6799926757812
    

    (这可能与你问题中的草图相符。)

  • 如果是test-pdf5.pdf:

    Start at: 23.0, 34.0
    Line to: 572.0, 34.0
    Line to: 572.0, -751.0
    Line to: 23.0, -751.0
    Line to: 23.0, 34.0
    

    Start at: 23.0, 819.0
    Line to: 572.0, 819.0
    Line to: 572.0, 34.0
    Line to: 23.0, 34.0
    Line to: 23.0, 819.0
    

由于与草图的匹配,我认为Illustrator会考虑在非平凡剪切路径生效时绘制的所有内容,复合元素,剪切路径为边框。

使用PDFBox

查找剪切路径

我使用PDFBox查找上面提到的剪切路径。我使用当前正在开发的2.0.0版本的当前SNAPSHOT,因为与当前版本1.8.8相比,所需的API得到了很大改进。

我将PDFGraphicsStreamEngine扩展为ClipPathFinder类:

public class ClipPathFinder extends PDFGraphicsStreamEngine implements Iterable<Path>
{
    public ClipPathFinder(PDPage page)
    {
        super(page);
    }

    //
    // PDFGraphicsStreamEngine overrides
    //
    public void findClipPaths() throws IOException
    {
        processPage(getPage());
    }

    @Override
    public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException
    {
        startPathIfNecessary();
        currentPath.appendRectangle(toFloat(p0), toFloat(p1), toFloat(p2), toFloat(p3));
    }

    @Override
    public void drawImage(PDImage pdImage) throws IOException { }

    @Override
    public void clip(int windingRule) throws IOException
    {
        currentPath.complete(windingRule);
        paths.add(currentPath);
        currentPath = null;
    }

    @Override
    public void moveTo(float x, float y) throws IOException
    {
        startPathIfNecessary();
        currentPath.moveTo(x, y);
    }

    @Override
    public void lineTo(float x, float y) throws IOException
    {
        currentPath.lineTo(x, y);
    }

    @Override
    public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException
    {
        currentPath.curveTo(x1, y1, x2, y2, x3, y3);
    }

    @Override
    public Point2D.Float getCurrentPoint() throws IOException
    {
        return currentPath.getCurrentPoint();
    }

    @Override
    public void closePath() throws IOException
    {
        currentPath.closePath();
    }

    @Override
    public void endPath() throws IOException
    {
        currentPath = null;
    }

    @Override
    public void strokePath() throws IOException
    {
        currentPath = null;
    }

    @Override
    public void fillPath(int windingRule) throws IOException
    {
        currentPath = null;
    }

    @Override
    public void fillAndStrokePath(int windingRule) throws IOException
    {
        currentPath = null;
    }

    @Override
    public void shadingFill(COSName shadingName) throws IOException
    {
        currentPath = null;
    }

    void startPathIfNecessary()
    {
        if (currentPath == null)
            currentPath = new Path();
    }

    Point2D.Float toFloat(Point2D p)
    {
        if (p == null || (p instanceof Point2D.Float))
        {
            return (Point2D.Float)p;
        }
        return new Point2D.Float((float)p.getX(), (float)p.getY());
    }

    //
    // Iterable<Path> implementation
    //
    public Iterator<Path> iterator()
    {
        return paths.iterator();
    }

    Path currentPath = null;
    final List<Path> paths = new ArrayList<Path>();
}

它使用此辅助类来表示路径:

public class Path implements Iterable<Path.SubPath>
{
    public static class Segment
    {
        Segment(Point2D.Float start, Point2D.Float end)
        {
            this.start = start;
            this.end = end;
        }

        public Point2D.Float getStart()
        {
            return start;
        }

        public Point2D.Float getEnd()
        {
            return end;
        }

        final Point2D.Float start, end; 
    }

    public class SubPath implements Iterable<Segment>
    {
        public class Line extends Segment
        {
            Line(Point2D.Float start, Point2D.Float end)
            {
                super(start, end);
            }

            //
            // Object override
            //
            @Override
            public String toString()
            {
                StringBuilder builder = new StringBuilder();
                builder.append("    Line to: ")
                       .append(end.getX())
                       .append(", ")
                       .append(end.getY())
                       .append('\n');
                return builder.toString();
            }
        }

        public class Curve extends Segment
        {
            Curve(Point2D.Float start, Point2D.Float control1, Point2D.Float control2, Point2D.Float end)
            {
                super(start, end);
                this.control1 = control1;
                this.control2 = control2;
            }

            public Point2D getControl1()
            {
                return control1;
            }

            public Point2D getControl2()
            {
                return control2;
            }

            //
            // Object override
            //
            @Override
            public String toString()
            {
                StringBuilder builder = new StringBuilder();
                builder.append("    Curve to: ")
                       .append(end.getX())
                       .append(", ")
                       .append(end.getY())
                       .append(" with Control1: ")
                       .append(control1.getX())
                       .append(", ")
                       .append(control1.getY())
                       .append(" and Control2: ")
                       .append(control2.getX())
                       .append(", ")
                       .append(control2.getY())
                       .append('\n');
                return builder.toString();
            }

            final Point2D control1, control2; 
        }

        SubPath(Point2D.Float start)
        {
            this.start = start;
            currentPoint = start;
        }

        public Point2D getStart()
        {
            return start;
        }

        void lineTo(float x, float y)
        {
            Point2D.Float end = new Point2D.Float(x, y);
            segments.add(new Line(currentPoint, end));
            currentPoint = end;
        }

        void curveTo(float x1, float y1, float x2, float y2, float x3, float y3)
        {
            Point2D.Float control1 = new Point2D.Float(x1, y1);
            Point2D.Float control2 = new Point2D.Float(x2, y2);
            Point2D.Float end = new Point2D.Float(x3, y3);
            segments.add(new Curve(currentPoint, control1, control2, end));
            currentPoint = end;
        }

        void closePath()
        {
            closed = true;
            currentPoint = start;
        }

        //
        // Iterable<Segment> implementation
        //
        public Iterator<Segment> iterator()
        {
            return segments.iterator();
        }

        //
        // Object override
        //
        @Override
        public String toString()
        {
            StringBuilder builder = new StringBuilder();
            builder.append("  {\n    Start at: ")
                   .append(start.getX())
                   .append(", ")
                   .append(start.getY())
                   .append('\n');
            for (Segment segment : segments)
                builder.append(segment);
            if (closed)
                builder.append("    Closed\n");
            builder.append("  }\n");
            return builder.toString();
        }

        boolean closed = false;
        final Point2D.Float start;
        final List<Segment> segments = new ArrayList<Path.Segment>();
    }

    public class Rectangle extends SubPath
    {
        Rectangle(Point2D.Float p0, Point2D.Float p1, Point2D.Float p2, Point2D.Float p3)
        {
            super(p0);
            lineTo((float)p1.getX(), (float)p1.getY());
            lineTo((float)p2.getX(), (float)p2.getY());
            lineTo((float)p3.getX(), (float)p3.getY());
            closePath();
        }

        //
        // Object override
        //
        @Override
        public String toString()
        {
            StringBuilder builder = new StringBuilder();
            builder.append("  {\n    Rectangle\n    Start at: ")
                   .append(start.getX())
                   .append(", ")
                   .append(start.getY())
                   .append('\n');
            for (Segment segment : segments)
                builder.append(segment);
            if (closed)
                builder.append("    Closed\n");
            builder.append("  }\n");
            return builder.toString();
        }
    }

    public int getWindingRule()
    {
        return windingRule;
    }

    void complete(int windingRule)
    {
        finishSubPath();
        this.windingRule = windingRule;
    }

    void appendRectangle(Point2D.Float p0, Point2D.Float p1, Point2D.Float p2, Point2D.Float p3) throws IOException
    {
        finishSubPath();
        currentSubPath = new Rectangle(p0, p1, p2, p3);
        finishSubPath();
    }

    void moveTo(float x, float y) throws IOException
    {
        finishSubPath();
        currentSubPath = new SubPath(new Point2D.Float(x, y));
    }

    void lineTo(float x, float y) throws IOException
    {
        currentSubPath.lineTo(x, y);
    }

    void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException
    {
        currentSubPath.curveTo(x1, y1, x2, y2, x3, y3);
    }

    Point2D.Float getCurrentPoint() throws IOException
    {
        return currentPoint;
    }

    void closePath() throws IOException
    {
        currentSubPath.closePath();
        finishSubPath();
    }

    void finishSubPath()
    {
        if (currentSubPath != null)
        {
            subPaths.add(currentSubPath);
            currentSubPath = null;
        }
    }

    //
    // Iterable<Path.SubPath> implementation
    //
    public Iterator<SubPath> iterator()
    {
        return subPaths.iterator();
    }

    //
    // Object override
    //
    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("{\n  Winding: ")
               .append(windingRule)
               .append('\n');
        for (SubPath subPath : subPaths)
            builder.append(subPath);
        builder.append("}\n");
        return builder.toString();
    }

    Point2D.Float currentPoint = null;
    SubPath currentSubPath = null;
    int windingRule = -1;
    final List<SubPath> subPaths = new ArrayList<Path.SubPath>();
}

ClipPathFinder的使用方式如下:

PDDocument document = PDDocument.load(PDFRESOURCE, null);
PDPage page = document.getPage(PAGENUMBER);
ClipPathFinder finder = new ClipPathFinder(page);
finder.findClipPaths();

for (Path path : finder)
{
    System.out.println(path);
}

document.close();