简介

Question

简介

我目前在java中使用此Paper (Efficiently selecting spatially distributed keypoints for visual tracking)实现了算法。我没有从论文中做出以下建议（第5节末尾的第3页）：

相对昂贵的电池盖操作可以是实质性的通过使用单个位来存储每个单元的状态 Gr。这使得使用按位OR运算能够“覆盖”连续使用预先计算的位掩码实现补丁覆盖应用于给定的位偏移位置。

测试，基准测试和分析

算法测试通过创建12 000个随机点进行测试，并以初始半径执行一次算法。附上JMH测试。
我用JProfiler持续描述对象内存吞吐量（实际创建的对象不多），CPU（它是一个CPU瓶颈），GC（这里没什么事），CPU瓶颈目前在bresenhamFilledCircle中方法（即所有动作发生的地方）。在12.000点中，大约1.500个点从主算法返回，因此bresenhamFilledCircle执行大约1.500 * 6.700 =大约1000万次pr。第二。这是约0.1微秒（100纳秒）pr调用。相当快，但应该有空间让它走得更快....

到目前为止我做了什么

开始使用基本强力算法：两个用于行和列的嵌套循环，以及一个标准Pythagorean theorem来判断我是否在一个圆圈内，将“绘制”圆圈为布尔[] []。
throughput ~3 500 ops/sec。
切换到使用System.arrayCopy进行填充而不是强制执行。
throughput ~5 600 ops/sec。
优化数组初始化（使用缓存）。
throughput ~6 000 ops/sec。
在行和列上添加了边距，以避免在算法期间进行边界检查。
throughput ~6 500 ops/sec。
切换到Bresenham's circle algorithm（略微修改以填充圆圈）以避免“复杂”的毕达哥拉斯检查。
throughput ~6 500 ops/sec。：（
从2D数组切换到1D数组..
throughput ~6 700 ops/sec。

现在我没有想法，除了将boolean []转换为byte []并使用位掩码进行设置/获取，如果我已正确理解文章中的建议。

任何挑战者？

以下是JMH测试：

public class KeyPointFilterBenchmark {
    private static final int DEFAULT_RADIUS = 10;

    @Benchmark
    public List<OpenCVKeyPoint> benchmarkFilterByRadius(KeyPointFilterState state) {
        return state.filter.filterByRadius(DEFAULT_RADIUS, state.list);
    }

    @State(Scope.Thread)
    public static class KeyPointFilterState {
        private static final int NUMBER_OF_POINTS = 12_000;
        private static final int IMAGE_WIDTH = 640;
        private static final int IMAGE_HEIGHT = 480;
        private static final int RESPONSE_RANGE = 255;
        private List<OpenCVKeyPoint> list;
        private KeyPointFilter filter;

        @Setup(Level.Trial)
        public void doSetup() {
            this.list = new ArrayList<>();
            for (int i = 0; i < NUMBER_OF_POINTS; i++) {
                double x = Math.random() * IMAGE_WIDTH;
                double y = Math.random() * IMAGE_HEIGHT;
                float response = (float) (Math.random() * RESPONSE_RANGE);
                list.add(new OpenCVKeyPoint(x, y, response));
            }
            this.filter = new KeyPointFilter(IMAGE_WIDTH, IMAGE_HEIGHT);
        }
    }
}

目前的实施：

public class KeyPointFilter {
    private boolean[] matrix;
    private final int rowCount;
    private final int colCount;
    private int matrixColCount;
    private int matrixRowCount;
    private boolean[] ones;
    private int radiusInitialized;

    public KeyPointFilter(int colCount, int rowCount) {
        this.colCount = colCount;
        this.rowCount = rowCount;
    }

    void init(int radius) {
        if (radiusInitialized == radius) {
            // Already initialized, just reset.
            this.matrix = new boolean[matrixRowCount * matrixColCount];
            return;
        }
        this.matrixRowCount = rowCount + radius * 2;
        this.matrixColCount = colCount + radius * 2;
        this.matrix = new boolean[matrixRowCount * matrixColCount];
        // Initialize a one array, to use in the coverAround arraycopy optimization.
        this.ones = new boolean[matrixColCount];
        for (int i = 0; i < ones.length; i++) {
            ones[i] = true;
        }
        radiusInitialized = radius;
    }

    public List<OpenCVKeyPoint> filterByRadius(int radius, List<OpenCVKeyPoint> input) {
        init(radius);
        List<OpenCVKeyPoint> filtered = new ArrayList<>();
        // Eliminating by covering
        for (OpenCVKeyPoint point : input) {
            int col = (int) point.getXPos();
            int row = (int) point.getYPos();
            if (!isSet(col, row)) {
                bresenhamFilledCircle(col, row, radius);
                filtered.add(point);
            }
        }
        return filtered;
    }

    void bresenhamFilledCircle(int col, int row, int radius) {
        // CHECKSTYLE IGNORE MagicNumber FOR NEXT 1 LINES.
        int d = (5 - radius * 4) / 4;
        int x = 0;
        int y = radius;
        int rowOffset = radius + row;
        int colOffset = radius + col;
        do {
            //Since we are filling a circle, we fill using System.arraycopy, from left to right.
            int yStart = colOffset - y;
            int yLength = 2 * y;
            // Row a bottom
            System.arraycopy(ones, 0, matrix, getIndex(rowOffset - x, yStart), yLength);
            if (x != 0) {
                int xStart = colOffset - x;
                int xLength = 2 * x;
                // Row a top
                System.arraycopy(ones, 0, matrix, getIndex(rowOffset + x, yStart), yLength);
                // Row b bottom
                System.arraycopy(ones, 0, matrix, getIndex(rowOffset - y, xStart), xLength);
                // Row b top
                System.arraycopy(ones, 0, matrix, getIndex(rowOffset + y, xStart), xLength);
            }
            if (d < 0) {
                d += 2 * x + 1;
            } else {
                d += 2 * (x - y) + 1;
                y--;
            }
            x++;
        } while (x <= y);
    }

    private int getIndex(int row, int col) {
        return row * matrixColCount + col;
    }

    private void debugArray() {
        StringBuilder actualResult = new StringBuilder();
        for (int row = 0; row < getRowCount(); row++) {
            for (int col = 0; col < getColCount(); col++) {
                actualResult.append(isSet(col, row) ? '1' : '0');
            }
            actualResult.append('\n');
        }
        System.out.println(actualResult);
    }

    public boolean isSet(int col, int row) {
        return matrix[getIndex(row + radiusInitialized, col + radiusInitialized)];
    }

    int getRowCount() {
        return rowCount;
    }

    int getColCount() {
        return colCount;
    }
}

加上要使用的关键点类：

public class OpenCVKeyPoint {
    private final double xPos;
    private final double yPos;
    private final float response;

    public OpenCVKeyPoint(double xPos, double yPos, float response) {
        this.xPos = xPos;
        this.yPos = yPos;
        this.response = response;
    }

    public float getResponse() {
        return response;
    }

    public double getXPos() {
        return xPos;
    }

    public double getYPos() {
        return yPos;
    }
}

Answer 1

您可以尽可能多地缓存更多计算和内联函数。

尝试用此替换filterByRadius并查看是否有任何改进：

public List<OpenCVKeyPoint> filterByRadius(final int radius, List<OpenCVKeyPoint> input) {
    init(radius);

    // Possibly give a hint to the arraylist on how much space to allocate from the start.
    List<OpenCVKeyPoint> filtered = new ArrayList<>();

    // calculate once
    final int d_init = (5 - radius * 4) / 4;

    // Eliminating by covering
    for (OpenCVKeyPoint point : input) {

        // FIXME do the points need to be doubles, only to be cast to int?
        int col = (int) point.getXPos();
        int row = (int) point.getYPos();

        if (!isSet(col, row)) {
            final int rowOffset = (radius + row) * matrixColCount;
            final int colOffset = radius + col;

            int d = d_init;
            int x = 0;
            int y = radius;
            do {
                final int yStart = colOffset - y;
                final int yLength = 2 * y;

                final int xByMatrixColCount = x * matrixColCount;
                final int rowOffsetPlusYStart = rowOffset + yStart;

                // Since we are filling a circle, we fill using System.arraycopy, from left to right.

                // Row a bottom
                System.arraycopy(ones, 0, matrix, (rowOffsetPlusYStart - xByMatrixColCount),
                        yLength);
                if (x != 0) {
                    // Row a top
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusYStart + xByMatrixColCount),
                            yLength);

                    // -----
                    final int xLength = 2 * x;
                    final int yByMatrixColCount = y * matrixColCount;
                    final int rowOffsetPlusXStart = rowOffset + colOffset - x;

                    // Row b bottom
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusXStart - yByMatrixColCount),
                            xLength);

                    // Row b top
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusXStart + yByMatrixColCount),
                            xLength);
                }
                if (d < 0) {
                    d += 2 * x + 1;
                } else {
                    d += 2 * (x - y) + 1;
                    y--;
                }
                x++;
            } while (x <= y);

            filtered.add(point);
        }
    }
    return filtered;
}

这可能不会有太大的改进，但你要求更快，我认为这将会更快，但我没有测量支持我。如果您对此进行基准测试，那么我很想知道结果！

Answer 2

所以，我想出了一个很好的优化。通用的Bresenham算法会在圆的顶部和底部附近的相同位置处产生多个涂料，但是通过使用自定义策略，我们可以有一个特定的绘画，例如10个半径，不再需要，几乎没有任何计算。半径为10的圆的自定义策略将是这样的：

System.arraycopy(ones, 0, matrix, getIndex(row, col + 7), 6);
System.arraycopy(ones, 0, matrix, getIndex(row + 1, col + 4), 12);
System.arraycopy(ones, 0, matrix, getIndex(row + 2, col + 3), 14);
System.arraycopy(ones, 0, matrix, getIndex(row + 3, col + 2), 16);
System.arraycopy(ones, 0, matrix, getIndex(row + 4, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 5, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 6, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 7, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 8, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 9, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 10, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 11, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 12, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 13, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 14, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 15, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 16, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 17, col + 2), 16);
System.arraycopy(ones, 0, matrix, getIndex(row + 18, col + 3), 14);
System.arraycopy(ones, 0, matrix, getIndex(row + 19, col + 4), 12);
System.arraycopy(ones, 0, matrix, getIndex(row + 20, col + 7), 6);

新的基准测试，并且吞吐量增加，现在达到~8 200 ops / sec。

如果我引入线程可能会更高，并且在parallell中执行列表，但现在这个吞吐量已经足够了。

如何进一步调整以下算法（Java）以使其更快？

简介

测试，基准测试和分析

到目前为止我做了什么

2 个答案: