如何进一步调整以下算法(Java)以使其更快?

时间:2018-03-27 19:31:06

标签: java performance opencv

简介

我目前在java中使用此Paper (Efficiently selecting spatially distributed keypoints for visual tracking)实现了算法。 我没有从论文中做出以下建议(第5节末尾的第3页):

  

相对昂贵的电池盖操作可以是实质性的   通过使用单个位来存储每个单元的状态   Gr。这使得使用按位OR运算能够“覆盖”连续   使用预先计算的位掩码实现补丁   覆盖应用于给定的位偏移位置。

测试,基准测试和分析

  • 算法测试通过创建12 000个随机点进行测试,并以初始半径执行一次算法。附上JMH测试。
  • 我用JProfiler持续描述对象内存吞吐量(实际创建的对象不多),CPU(它是一个CPU瓶颈),GC(这里没什么事),CPU瓶颈目前在bresenhamFilledCircle中方法(即所有动作发生的地方)。 在12.000点中,大约1.500个点从主算法返回,因此bresenhamFilledCircle执行大约1.500 * 6.700 =大约1000万次pr。第二。这是约0.1微秒(100纳秒)pr调用。相当快,但应该有空间让它走得更快....

到目前为止我做了什么

  • 开始使用基本强力算法:两个用于行和列的嵌套循环,以及一个标准Pythagorean theorem来判断我是否在一个圆圈内,将“绘制”圆圈为布尔[] []。
    throughput ~3 500 ops/sec
  • 切换到使用System.arrayCopy进行填充而不是强制执行。
    throughput ~5 600 ops/sec

  • 优化数组初始化(使用缓存)。
    throughput ~6 000 ops/sec

  • 在行和列上添加了边距,以避免在算法期间进行边界检查。
    throughput ~6 500 ops/sec
  • 切换到Bresenham's circle algorithm(略微修改以填充圆圈)以避免“复杂”的毕达哥拉斯检查。
    throughput ~6 500 ops/sec。 :(
  • 从2D数组切换到1D数组..
    throughput ~6 700 ops/sec

现在我没有想法,除了将boolean []转换为byte []并使用位掩码进行设置/获取,如果我已正确理解文章中的建议。

任何挑战者?

以下是JMH测试:

public class KeyPointFilterBenchmark {
    private static final int DEFAULT_RADIUS = 10;

    @Benchmark
    public List<OpenCVKeyPoint> benchmarkFilterByRadius(KeyPointFilterState state) {
        return state.filter.filterByRadius(DEFAULT_RADIUS, state.list);
    }

    @State(Scope.Thread)
    public static class KeyPointFilterState {
        private static final int NUMBER_OF_POINTS = 12_000;
        private static final int IMAGE_WIDTH = 640;
        private static final int IMAGE_HEIGHT = 480;
        private static final int RESPONSE_RANGE = 255;
        private List<OpenCVKeyPoint> list;
        private KeyPointFilter filter;

        @Setup(Level.Trial)
        public void doSetup() {
            this.list = new ArrayList<>();
            for (int i = 0; i < NUMBER_OF_POINTS; i++) {
                double x = Math.random() * IMAGE_WIDTH;
                double y = Math.random() * IMAGE_HEIGHT;
                float response = (float) (Math.random() * RESPONSE_RANGE);
                list.add(new OpenCVKeyPoint(x, y, response));
            }
            this.filter = new KeyPointFilter(IMAGE_WIDTH, IMAGE_HEIGHT);
        }
    }
}

目前的实施:

public class KeyPointFilter {
    private boolean[] matrix;
    private final int rowCount;
    private final int colCount;
    private int matrixColCount;
    private int matrixRowCount;
    private boolean[] ones;
    private int radiusInitialized;

    public KeyPointFilter(int colCount, int rowCount) {
        this.colCount = colCount;
        this.rowCount = rowCount;
    }

    void init(int radius) {
        if (radiusInitialized == radius) {
            // Already initialized, just reset.
            this.matrix = new boolean[matrixRowCount * matrixColCount];
            return;
        }
        this.matrixRowCount = rowCount + radius * 2;
        this.matrixColCount = colCount + radius * 2;
        this.matrix = new boolean[matrixRowCount * matrixColCount];
        // Initialize a one array, to use in the coverAround arraycopy optimization.
        this.ones = new boolean[matrixColCount];
        for (int i = 0; i < ones.length; i++) {
            ones[i] = true;
        }
        radiusInitialized = radius;
    }

    public List<OpenCVKeyPoint> filterByRadius(int radius, List<OpenCVKeyPoint> input) {
        init(radius);
        List<OpenCVKeyPoint> filtered = new ArrayList<>();
        // Eliminating by covering
        for (OpenCVKeyPoint point : input) {
            int col = (int) point.getXPos();
            int row = (int) point.getYPos();
            if (!isSet(col, row)) {
                bresenhamFilledCircle(col, row, radius);
                filtered.add(point);
            }
        }
        return filtered;
    }

    void bresenhamFilledCircle(int col, int row, int radius) {
        // CHECKSTYLE IGNORE MagicNumber FOR NEXT 1 LINES.
        int d = (5 - radius * 4) / 4;
        int x = 0;
        int y = radius;
        int rowOffset = radius + row;
        int colOffset = radius + col;
        do {
            //Since we are filling a circle, we fill using System.arraycopy, from left to right.
            int yStart = colOffset - y;
            int yLength = 2 * y;
            // Row a bottom
            System.arraycopy(ones, 0, matrix, getIndex(rowOffset - x, yStart), yLength);
            if (x != 0) {
                int xStart = colOffset - x;
                int xLength = 2 * x;
                // Row a top
                System.arraycopy(ones, 0, matrix, getIndex(rowOffset + x, yStart), yLength);
                // Row b bottom
                System.arraycopy(ones, 0, matrix, getIndex(rowOffset - y, xStart), xLength);
                // Row b top
                System.arraycopy(ones, 0, matrix, getIndex(rowOffset + y, xStart), xLength);
            }
            if (d < 0) {
                d += 2 * x + 1;
            } else {
                d += 2 * (x - y) + 1;
                y--;
            }
            x++;
        } while (x <= y);
    }

    private int getIndex(int row, int col) {
        return row * matrixColCount + col;
    }

    private void debugArray() {
        StringBuilder actualResult = new StringBuilder();
        for (int row = 0; row < getRowCount(); row++) {
            for (int col = 0; col < getColCount(); col++) {
                actualResult.append(isSet(col, row) ? '1' : '0');
            }
            actualResult.append('\n');
        }
        System.out.println(actualResult);
    }

    public boolean isSet(int col, int row) {
        return matrix[getIndex(row + radiusInitialized, col + radiusInitialized)];
    }

    int getRowCount() {
        return rowCount;
    }

    int getColCount() {
        return colCount;
    }
}

加上要使用的关键点类:

public class OpenCVKeyPoint {
    private final double xPos;
    private final double yPos;
    private final float response;

    public OpenCVKeyPoint(double xPos, double yPos, float response) {
        this.xPos = xPos;
        this.yPos = yPos;
        this.response = response;
    }

    public float getResponse() {
        return response;
    }

    public double getXPos() {
        return xPos;
    }

    public double getYPos() {
        return yPos;
    }
}

2 个答案:

答案 0 :(得分:0)

您可以尽可能多地缓存更多计算和内联函数。

尝试用此替换filterByRadius并查看是否有任何改进:

public List<OpenCVKeyPoint> filterByRadius(final int radius, List<OpenCVKeyPoint> input) {
    init(radius);

    // Possibly give a hint to the arraylist on how much space to allocate from the start.
    List<OpenCVKeyPoint> filtered = new ArrayList<>();

    // calculate once
    final int d_init = (5 - radius * 4) / 4;

    // Eliminating by covering
    for (OpenCVKeyPoint point : input) {

        // FIXME do the points need to be doubles, only to be cast to int?
        int col = (int) point.getXPos();
        int row = (int) point.getYPos();

        if (!isSet(col, row)) {
            final int rowOffset = (radius + row) * matrixColCount;
            final int colOffset = radius + col;

            int d = d_init;
            int x = 0;
            int y = radius;
            do {
                final int yStart = colOffset - y;
                final int yLength = 2 * y;

                final int xByMatrixColCount = x * matrixColCount;
                final int rowOffsetPlusYStart = rowOffset + yStart;

                // Since we are filling a circle, we fill using System.arraycopy, from left to right.

                // Row a bottom
                System.arraycopy(ones, 0, matrix, (rowOffsetPlusYStart - xByMatrixColCount),
                        yLength);
                if (x != 0) {
                    // Row a top
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusYStart + xByMatrixColCount),
                            yLength);

                    // -----
                    final int xLength = 2 * x;
                    final int yByMatrixColCount = y * matrixColCount;
                    final int rowOffsetPlusXStart = rowOffset + colOffset - x;

                    // Row b bottom
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusXStart - yByMatrixColCount),
                            xLength);

                    // Row b top
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusXStart + yByMatrixColCount),
                            xLength);
                }
                if (d < 0) {
                    d += 2 * x + 1;
                } else {
                    d += 2 * (x - y) + 1;
                    y--;
                }
                x++;
            } while (x <= y);

            filtered.add(point);
        }
    }
    return filtered;
}

这可能不会有太大的改进,但你要求更快,我认为这将会更快,但我没有测量支持我。如果您对此进行基准测试,那么我很想知道结果!

答案 1 :(得分:0)

所以,我想出了一个很好的优化。 通用的Bresenham算法会在圆的顶部和底部附近的相同位置处产生多个涂料,但是通过使用自定义策略, 我们可以有一个特定的绘画,例如10个半径,不再需要,几乎没有任何计算。 半径为10的圆的自定义策略将是这样的:

System.arraycopy(ones, 0, matrix, getIndex(row, col + 7), 6);
System.arraycopy(ones, 0, matrix, getIndex(row + 1, col + 4), 12);
System.arraycopy(ones, 0, matrix, getIndex(row + 2, col + 3), 14);
System.arraycopy(ones, 0, matrix, getIndex(row + 3, col + 2), 16);
System.arraycopy(ones, 0, matrix, getIndex(row + 4, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 5, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 6, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 7, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 8, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 9, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 10, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 11, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 12, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 13, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 14, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 15, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 16, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 17, col + 2), 16);
System.arraycopy(ones, 0, matrix, getIndex(row + 18, col + 3), 14);
System.arraycopy(ones, 0, matrix, getIndex(row + 19, col + 4), 12);
System.arraycopy(ones, 0, matrix, getIndex(row + 20, col + 7), 6);

新的基准测试,并且吞吐量增加,现在达到~8 200 ops / sec。

如果我引入线程可能会更高,并且在parallell中执行列表,但现在这个吞吐量已经足够了。