Question

我有以下Matrix4f课程：

public class Matrix4f {
    private final static float EPSILON = 0.01f;

    private final static Matrix4f IDENTITY = new Matrix4f(new float[] {
        1.0f, 0.0f, 0.0f, 0.0f, //X column
        0.0f, 1.0f, 0.0f, 0.0f, //Y column
        0.0f, 0.0f, 1.0f, 0.0f, //Z column
        0.0f, 0.0f, 0.0f, 1.0f  //W column
    });

    private final float[] elements = new float[16];

    public Matrix4f() {

    }

    public Matrix4f(final float[] elements) {
        System.arraycopy(elements, 0, this.elements, 0, 16);
    }

    public Matrix4f multiply(final Matrix4f other) {
        float[] a = getElements();
        float[] b = other.getElements();
        return new Matrix4f(new float[] {
            a[0] * b[0] +   a[4] * b[1] +   a[8] * b[2] +   a[12] * b[3],
            a[1] * b[0] +   a[5] * b[1] +   a[9] * b[2] +   a[13] * b[3],
            a[2] * b[0] +   a[6] * b[1] +   a[10] * b[2] +  a[14] * b[3],
            a[3] * b[0] +   a[7] * b[1] +   a[11] * b[2] +  a[15] * b[3],   //X column

            a[0] * b[4] +   a[4] * b[5] +   a[8] * b[6] +   a[12] * b[7],
            a[1] * b[4] +   a[5] * b[5] +   a[9] * b[6] +   a[13] * b[7],
            a[2] * b[4] +   a[6] * b[5] +   a[10] * b[6] +  a[14] * b[7],
            a[3] * b[4] +   a[7] * b[5] +   a[11] * b[6] +  a[15] * b[7],   //Y column

            a[0] * b[8] +   a[4] * b[9] +   a[8] * b[10] +  a[12] * b[11],
            a[1] * b[8] +   a[5] * b[9] +   a[9] * b[10] +  a[13] * b[11],
            a[2] * b[8] +   a[6] * b[9] +   a[10] * b[10] + a[14] * b[11],
            a[3] * b[8] +   a[7] * b[9] +   a[11] * b[10] + a[15] * b[11],  //Z column

            a[0] * b[12] +  a[4] * b[13] +  a[8] * b[14] +  a[12] * b[15],
            a[1] * b[12] +  a[5] * b[13] +  a[9] * b[14] +  a[13] * b[15],
            a[2] * b[12] +  a[6] * b[13] +  a[10] * b[14] + a[14] * b[15],
            a[3] * b[12] +  a[7] * b[13] +  a[11] * b[14] + a[15] * b[15]  //W column            
        });
    }

    public FloatBuffer asFloatBuffer() {
        FloatBuffer floatBuffer = BufferUtils.createFloatBuffer(elements.length).put(elements);
        floatBuffer.flip();
        return floatBuffer;
    }

    public FloatBuffer writeToFloatBuffer(final FloatBuffer floatBuffer) {
        floatBuffer.clear();
        floatBuffer.put(elements);
        floatBuffer.flip();
        return floatBuffer;
    }

    float[] getElements() {
        return elements;
    }

    @Override
    public String toString() {
        return Arrays.toString(elements);
    }

    public static Matrix4f identity() {
        return IDENTITY;
    }

    public static Matrix4f scale(final float sx, final float sy, final float sz) {
        return new Matrix4f(new float[] {
            sx, 0.0f, 0.0f, 0.0f,   //X column
            0.0f, sy, 0.0f, 0.0f,   //Y column
            0.0f, 0.0f, sz, 0.0f,   //Z column
            0.0f, 0.0f, 0.0f, 1.0f  //W column
        });
    }

    public static Matrix4f translate(final float tx, final float ty, final float tz) {
        return new Matrix4f(new float[] {
            1.0f, 0.0f, 0.0f, 0.0f, //X column
            0.0f, 1.0f, 0.0f, 0.0f, //Y column
            0.0f, 0.0f, 1.0f, 0.0f, //Z column
            tx,    ty,    tz, 1.0f  //W column
        });
    }

    public static Matrix4f rotate(final float theta, final float x, final float y, final float z) {
        if (Math.abs(x * x + y * y + z * z - 1.0f) >= EPSILON) {
            throw new IllegalArgumentException("(x, y, z) is not a unit vector: x = " + x + ", y = " + y + ", z = " + z);
        }
        float thetaRad = (float)Math.toRadians(theta);
        float cosTheta = (float)Math.cos(thetaRad);
        float sinTheta = (float)Math.sin(thetaRad);
        float cosThetaRes = 1.0f - cosTheta;
        return new Matrix4f(new float[] {
            cosTheta + x * x * cosThetaRes,     y * x * cosThetaRes + z * sinTheta, z * x * cosThetaRes - y * sinTheta, 0.0f,   //X column
            x * y * cosThetaRes - z * sinTheta, cosTheta + y * y * cosThetaRes,     z * y * cosThetaRes + x * sinTheta, 0.0f,   //Y column
            x * z * cosThetaRes + y * sinTheta, y * z * cosThetaRes - x * sinTheta, cosTheta + z * z * cosThetaRes,     0.0f,   //Z column
            0.0f,                               0.0f,                               0.0f,                               1.0f    //W column
        });
    }

    public static Matrix4f frustum(final float left, final float right, final float bottom, final float top, final float near, final float far) {
        return new Matrix4f(new float[] {
            2 * near / (right - left),          0.0f,                               0.0f,                           0.0f,   //X column
            0.0f,                               2 * near / (top - bottom),          0.0f,                           0.0f,   //Y column
            (right + left) / (right - left),    (top + bottom) / (top - bottom),    (near + far) / (near - far),    -1.0f,  //Z column
            0.0f,                               0.0f,                               2 * near * far / (near - far),  0.0f    //Z column
        });
    }

    public static Matrix4f perspective(final float fovy, final float aspect, final float near, final float far) {
        float y2 = near * (float)Math.tan(Math.toRadians(fovy * 0.5f));
        float y1 = -y2;
        float x1 = y1 * aspect;
        float x2 = y2 * aspect;
        return frustum(x1, x2, y1, y2, near, far);
    }

    public static Matrix4f multiply(final Matrix4f... matrices) {
        Matrix4f output = identity();
        for (Matrix4f matrix : matrices) {
            output = output.multiply(matrix);
        }
        return output;
    }
}

在分析我的3D应用程序时几乎所有内容都是正确的，除了正在制作异常大量的float[]。这可能是正常的行为，因为正在进行大量的矩阵乘法。

如果我要将其更改为使用16 float而不是float[]，那么性能（以及原因）会有显着改善吗？

我在几分钟前优化了以下部分（下面），它给了我巨大的性能提升：

public FloatBuffer asFloatBuffer() {
    FloatBuffer floatBuffer = BufferUtils.createFloatBuffer(elements.length).put(elements);
    floatBuffer.flip();
    return floatBuffer;
}

public FloatBuffer writeToFloatBuffer(final FloatBuffer floatBuffer) {
    floatBuffer.clear();
    floatBuffer.put(elements);
    floatBuffer.flip();
    return floatBuffer;
}

我有效地做了什么摆脱了new个实例，类似的东西是否适用于花车和为什么？

更新：我制作了新版本，它肯定显示了它的改进！我曾经能够绘制240次，这反过来因为垃圾收集开销而每秒都造成口吃。现在我可以使用不会产生任何垃圾的方法绘制24000次，现在实际的限制因素很可能是我只是做了太多的OpenGL调用或osmething，这不是问题，因为我应该搜索其他方法，如果我是在真实场景中将大量数据发送到OpenGL。

更新的代码：

@Override
protected void render(final double msDelta) {
    glClearColor(0.0f, 0.25f, 0.0f, 1.0f);
    glClearDepthf(1f);
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    testProgram.use();

    FloatBuffer modelViewMatrixBuffer = BufferUtils.createFloatBuffer(16);
    Matrix4f modelviewMatrix = new Matrix4f();

    for (int i = 0; i < 24000; i++) {
        float fVar = i + currentTime / 1000f * 0.3f;
        modelviewMatrix.identity()
                .translate(0.0f, 0.0f, -8.0f)   //translate
                .rotate(currentTime / 1000f * 45.0f, 0.0f, 1.0f, 0.0f)  //rotate around Y
                .rotate(currentTime / 1000f * 21.0f, 1.0f, 0.0f, 0.0f)  //rotate around X
                .translate(
                    (float)Math.sin(2.1f * fVar) * 2.0f,
                    (float)Math.cos(1.7f * fVar) * 2.0f,
                    (float)Math.sin(1.3f * fVar) * (float)Math.cos(1.5f * fVar) * 2.0f
                );  //translate
        glUniformMatrix4(MODELVIEW_LOCATION, false, modelviewMatrix.writeToFloatBuffer(modelViewMatrixBuffer));    
        glDrawArrays(GL_TRIANGLES, 0, 36);
    }
}

public class Matrix4f {
    private final static float EPSILON = 0.01f;    
    private final static int LENGTH = 16;

    private float elem0 = 0.0f, elem1 = 0.0f, elem2 = 0.0f, elem3 = 0.0f,
            elem4 = 0.0f, elem5 = 0.0f, elem6 = 0.0f, elem7 = 0.0f,
            elem8 = 0.0f, elem9 = 0.0f, elem10 = 0.0f, elem11 = 0.0f,
            elem12 = 0.0f, elem13 = 0.0f, elem14 = 0.0f, elem15 = 0.0f;

    public Matrix4f() {

    }

    public Matrix4f(final float elem0, final float elem1, final float elem2, final float elem3,
            final float elem4, final float elem5, final float elem6, final float elem7,
            final float elem8, final float elem9, final float elem10, final float elem11, 
            final float elem12, final float elem13, final float elem14, final float elem15) {
        set(elem0, elem1, elem2, elem3, elem4, elem5, elem6, elem7, elem8, elem9, elem10, elem11, elem12, elem13, elem14, elem15);
    }

    public Matrix4f identity() {
        set(
            1.0f, 0.0f, 0.0f, 0.0f, //X column
            0.0f, 1.0f, 0.0f, 0.0f, //Y column
            0.0f, 0.0f, 1.0f, 0.0f, //Z column
            0.0f, 0.0f, 0.0f, 1.0f   //W column
        );
        return this;
    }

    public Matrix4f multiply(final Matrix4f other) {
        return multiply(
            other.elem0, other.elem1, other.elem2, other.elem3, 
            other.elem4, other.elem5, other.elem6, other.elem7, 
            other.elem8, other.elem9, other.elem10, other.elem11, 
            other.elem12, other.elem13, other.elem14, other.elem15
        );
    }

    public Matrix4f multiply(final float mul0, final float mul1, final float mul2, final float mul3,
            final float mul4, final float mul5, final float mul6, final float mul7,
            final float mul8, final float mul9, final float mul10, final float mul11,
            final float mul12, final float mul13, final float mul14, final float mul15) {
        set(
            this.elem0 * mul0 +   this.elem4 * mul1 +   this.elem8 * mul2 +   this.elem12 * mul3,
            this.elem1 * mul0 +   this.elem5 * mul1 +   this.elem9 * mul2 +   this.elem13 * mul3,
            this.elem2 * mul0 +   this.elem6 * mul1 +   this.elem10 * mul2 +  this.elem14 * mul3,
            this.elem3 * mul0 +   this.elem7 * mul1 +   this.elem11 * mul2 +  this.elem15 * mul3,   //X column

            this.elem0 * mul4 +   this.elem4 * mul5 +   this.elem8 * mul6 +   this.elem12 * mul7,
            this.elem1 * mul4 +   this.elem5 * mul5 +   this.elem9 * mul6 +   this.elem13 * mul7,
            this.elem2 * mul4 +   this.elem6 * mul5 +   this.elem10 * mul6 +  this.elem14 * mul7,
            this.elem3 * mul4 +   this.elem7 * mul5 +   this.elem11 * mul6 +  this.elem15 * mul7,   //Y column

            this.elem0 * mul8 +   this.elem4 * mul9 +   this.elem8 * mul10 +  this.elem12 * mul11,
            this.elem1 * mul8 +   this.elem5 * mul9 +   this.elem9 * mul10 +  this.elem13 * mul11,
            this.elem2 * mul8 +   this.elem6 * mul9 +   this.elem10 * mul10 + this.elem14 * mul11,
            this.elem3 * mul8 +   this.elem7 * mul9 +   this.elem11 * mul10 + this.elem15 * mul11,  //Z column

            this.elem0 * mul12 +  this.elem4 * mul13 +  this.elem8 * mul14 +  this.elem12 * mul15,
            this.elem1 * mul12 +  this.elem5 * mul13 +  this.elem9 * mul14 +  this.elem13 * mul15,
            this.elem2 * mul12 +  this.elem6 * mul13 +  this.elem10 * mul14 + this.elem14 * mul15,
            this.elem3 * mul12 +  this.elem7 * mul13 +  this.elem11 * mul14 + this.elem15 * mul15  //W column            
        );
        return this;
    }

    public Matrix4f scale(final float sx, final float sy, final float sz) {
        return multiply(
            sx, 0.0f, 0.0f, 0.0f,   //X column
            0.0f, sy, 0.0f, 0.0f,   //Y column
            0.0f, 0.0f, sz, 0.0f,   //Z column
            0.0f, 0.0f, 0.0f, 1.0f  //W column
        );
    }

    public Matrix4f translate(final float tx, final float ty, final float tz) {
        return multiply(
            1.0f, 0.0f, 0.0f, 0.0f, //X column
            0.0f, 1.0f, 0.0f, 0.0f, //Y column
            0.0f, 0.0f, 1.0f, 0.0f, //Z column
            tx,    ty,    tz, 1.0f  //W column
        );
    }

    public Matrix4f rotate(final float theta, final float x, final float y, final float z) {
        if (Math.abs(x * x + y * y + z * z - 1.0f) >= EPSILON) {
            throw new IllegalArgumentException("(x, y, z) is not a unit vector: x = " + x + ", y = " + y + ", z = " + z);
        }
        float thetaRad = (float)Math.toRadians(theta);
        float cosTheta = (float)Math.cos(thetaRad);
        float sinTheta = (float)Math.sin(thetaRad);
        float cosThetaRes = 1.0f - cosTheta;
        return multiply(
            cosTheta + x * x * cosThetaRes,     y * x * cosThetaRes + z * sinTheta, z * x * cosThetaRes - y * sinTheta, 0.0f,   //X column
            x * y * cosThetaRes - z * sinTheta, cosTheta + y * y * cosThetaRes,     z * y * cosThetaRes + x * sinTheta, 0.0f,   //Y column
            x * z * cosThetaRes + y * sinTheta, y * z * cosThetaRes - x * sinTheta, cosTheta + z * z * cosThetaRes,     0.0f,   //Z column
            0.0f,                               0.0f,                               0.0f,                               1.0f    //W column
        );
    }

    public Matrix4f frustum(final float left, final float right, final float bottom, final float top, final float near, final float far) {
        return multiply(
            2 * near / (right - left),          0.0f,                               0.0f,                           0.0f,   //X column
            0.0f,                               2 * near / (top - bottom),          0.0f,                           0.0f,   //Y column
            (right + left) / (right - left),    (top + bottom) / (top - bottom),    (near + far) / (near - far),    -1.0f,  //Z column
            0.0f,                               0.0f,                               2 * near * far / (near - far),  0.0f    //Z column
        );
    }

    public Matrix4f perspective(final float fovy, final float aspect, final float near, final float far) {
        float y2 = near * (float)Math.tan(Math.toRadians(fovy * 0.5f));
        float y1 = -y2;
        float x1 = y1 * aspect;
        float x2 = y2 * aspect;
        return frustum(x1, x2, y1, y2, near, far);
    }

    public FloatBuffer asFloatBuffer() {
        FloatBuffer floatBuffer = BufferUtils.createFloatBuffer(LENGTH)
                .put(elem0).put(elem1).put(elem2).put(elem3)
                .put(elem4).put(elem5).put(elem6).put(elem7)
                .put(elem8).put(elem9).put(elem10).put(elem11)
                .put(elem12).put(elem13).put(elem14).put(elem15);
        floatBuffer.flip();
        return floatBuffer;
    }

    public FloatBuffer writeToFloatBuffer(final FloatBuffer floatBuffer) {
        floatBuffer.clear();
        floatBuffer.put(elem0).put(elem1).put(elem2).put(elem3)
                .put(elem4).put(elem5).put(elem6).put(elem7)
                .put(elem8).put(elem9).put(elem10).put(elem11)
                .put(elem12).put(elem13).put(elem14).put(elem15);
        floatBuffer.flip();
        return floatBuffer;
    }

    private void set(final float elem0, final float elem1, final float elem2, final float elem3,
            final float elem4, final float elem5, final float elem6, final float elem7,
            final float elem8, final float elem9, final float elem10, final float elem11, 
            final float elem12, final float elem13, final float elem14, final float elem15) {
        this.elem0 = elem0;
        this.elem1 = elem1;
        this.elem2 = elem2;
        this.elem3 = elem3;
        this.elem4 = elem4;
        this.elem5 = elem5;
        this.elem6 = elem6;
        this.elem7 = elem7;
        this.elem8 = elem8;
        this.elem9 = elem9;
        this.elem10 = elem10;
        this.elem11 = elem11;
        this.elem12 = elem12;
        this.elem13 = elem13;
        this.elem14 = elem14;
        this.elem15 = elem15;
    }

    @Override
    public String toString() {
        return "[" + 
                elem0 + ", "  + elem1 + ", "  + elem2 + ", "  + elem3 + ", " +
                elem4 + ", "  + elem5 + ", "  + elem6 + ", "  + elem7 + ", " +
                elem8 + ", "  + elem9 + ", "  + elem10 + ", " + elem11 + ", " +
                elem12 + ", " + elem13 + ", " + elem14 + ", " + elem15 + "]";
    }
}

Answer 1

java中的数组是类！ - ＆GT;所以它们是在堆中创建的并且是GC的制服 - 这是最大的性能杀手之一（因为它们会冻结应用程序，直到GC完成）。你做的分配越少越好！同样作为calsses，他们还有额外的内存开销：

每个Java对象都有一个包含对JVM重要的信息的标头。最重要的是对对象类的引用（一个机器字），并且垃圾收集器使用一些标志并管理同步（因为每个对象都可以同步），这会占用另一个机器字（使用部分单词会对表现不利）。这就是2个字，即32位系统上的8个字节，64位上的16个字节。数组还需要一个int字段用于数组长度，这是另外4个字节，在64位系统上可能是8个字节。

来源：https://softwareengineering.stackexchange.com/questions/162546/why-the-overhead-when-allocating-objects-arrays-in-java

PS：只是旁注 - 使用-XX:+DoEscapeAnalysis运行您的应用 - 这可能会减少分配次数

Answer 2

使用16个浮点数而不是数组将节省~16个字节。固定数量的局部变量也可以帮助您避免创建新对象。例如你可以使用可变的matricies并避免创建一个新对象。

public Matrix multiply(Matrix m) {
    float a11 = this.a11;
    // etc
    float a44 = this.a44;

    this.a11 = ...;
    // etc
    this.a44 = ...;
}

注意：此操作根本不会产生垃圾。

Answer 3

首先，我不明白为什么你需要为每个矩阵实例化Matrix4f类。您可以直接在float[]对象上操作，并为每个对象保存一个分配。这将为每个矩阵节省8个字节（在64位系统上可能是16个字节，我不确定）。这不是什么大问题，但由于这8个字节不会给你买任何东西，我认为值得做。

其次，我猜测无论何时你想翻译，旋转或缩放某些东西，你都会调用创建适当变换矩阵的方法，然后将它乘以代表你正在变换的东西的矩阵。

这对我来说似乎很浪费，因为

你分配了两个阵列 - 其中一个你将立即扔掉
您在1.0或0.0之间进行了大量的倍增，并将结果相加。所以你做了很多算术。

我会重写您的translate，rotate和scale方法，以便每个方法都需要一个额外的参数 - 您想要转换的矩阵，并且只做最小量的算术，更重要的是，只有一个分配。例如

public static float[] scale(float sx, float sy, float sz, float[] operand) {
    float[] toReturn = new float[16];
    for (i = 0; i <= 3; i++){ 
        toReturn[ i ] = operand[ i ] * sx;
        toReturn[ i + 4 ] = operand[ i + 4 ] * sy;
        toReturn[ i + 8 ] = operand[ i + 8 ] * sz;
        toReturn[ i + 12 ] = operand[ i + 12 ];
    }
    return toReturn;
}

如果在进行其中一个转换时不需要保留原始矩阵，则可以使每个转换直接在矩阵上运行。这将为您节省两个转换，而不仅仅是一个转换。

例如，在您不想保留原始矩阵的情况下，您的scale方法可能如下所示。

public static void scale(float sx, float sy, float sz, float[] operand) {
    for (i = 0; i <= 3; i++){ 
        operand[ i ] *= sx;
        operand[ i + 4 ] *= sy;
        operand[ i + 8 ] *= sz;
    }
}

只修改现有矩阵。这意味着你摆脱了float[]的两个分配 - 一个用于转换矩阵，一个用于结果矩阵。

如果你需要保留原始矩阵，那么你可以用创建结果矩阵的方式来编写它，而不是转换矩阵;所以你将分配的数量减半。

Answer 4

是的，Java中的数组表示为对象。你可以自己测试一下：

private final float[] elements = new float[16];

if (elements instanceof Object){ 
   System.out.println("Array are Objects!")
}

所以在记忆方面你应该得到一个好处，即使我不确定多少。我很好奇，所以如果你发现请分享：）

Answer 5

回答关于堆上大量float []实例的问题。

在我看来，您遇到了以下两个问题之一：要么是创建了太多实际使用的Matrix4f类实例（有实时引用），要么某些实例不是垃圾收集时应该是

第二个可能会成为一个大问题。在我看来，没有障碍使Matrix4f不可变 - 现在它不是，因为elements数组在getElements()中转义类，因为数组 总是可变的 加上一些代码可以获得对该float []的引用，并且永远不会放弃它导致堆污染。我的建议是，如果你不需要它，可以删除这个方法，或者如果你真的需要这个或提供一个使用行和栏第

使用float字段而不是float数组应该减少分配数量（float []的分配数量）但是堆仍然可能被Matrix4f实例污染所以要小心。

请注意，element数组也在asFloatBuffer方法中转义。所以我再次考虑完全消除这个或使用副本 - 这样至少Matrix4f对象将被GC。

至于性能，我首先检查重复计算，例如x*cosThetaRes等在多个地方使用但总是重新计算它们。另一件事是检查另一个使用较少操作的算法以获得结果，尤其是当您知道大小时。不确定，但我想Strassen algorithm可能会有所帮助，因为它相对简单。

在内存使用方面，使用float []数组还是16个浮点数更好？

5 个答案: