Question

目前我的OpenGL ES程序存在性能瓶颈。我认为它会运行良好 - 使用VBO，textureatlas，每个绘制调用的几个绑定等等。但是当同时使用许多精灵时，性能会下降很多。我发现瓶颈是CPU限制的（有点惊讶）。更准确地说 - 瓶颈可以通过一种方法来计算每个矩形四个垂直点 - x1，y1，x2，y2，x3，y3，x4，y4的屏幕位置。这用于碰撞检测。我在这个方法中做的是重复在着色器中完成的操作，我认为许多cpu-cycle是由MV乘法引起的。

 Matrix.multiplyMV(resultVec, 0, mModelMatrix, 0, rhsVec, 0);

rhsVec是一个浮点数组，它存储如上所述的顶点。

因为这似乎是瓶颈，我想知道如何在例如计算剪辑坐标时访问着色器中的相同矢量？剪切坐标甚至更好的是他在着色管中进一步向下着色的坐标。

顶点着色器

uniform mat4 u_MVPMatrix;                          
uniform mat4 u_MVMatrix;  
varying vec2 v_TexCoordinate;           
attribute vec4 position;

void main()                                                     
{

   v_TexCoordinate = a_TexCoordinate     

   gl_Position = u_MVPMatrix * a_Position;

}

onSurfaceCreated的片段

     final int vertexShaderHandle = ShaderHelper.compileShader(GLES20.GL_VERTEX_SHADER, vertexShader);
    final int fragmentShaderHandle = ShaderHelper.compileShader(GLES20.GL_FRAGMENT_SHADER, fragmentShader);

    mProgramHandle = ShaderHelper.createAndLinkProgram(vertexShaderHandle, fragmentShaderHandle,
            new String[] {"a_Position",  "a_Color", "a_Normal", "a_TexCoordinate"});

    textureHandle = TextureHelper.loadTexture(context);

    GLES20.glUseProgram(mProgramHandle);

    mMVPMatrixHandle = GLES20.glGetUniformLocation(mProgramHandle, "u_MVPMatrix");
    mMVMatrixHandle = GLES20.glGetUniformLocation(mProgramHandle, "u_MVMatrix");
    //mColorHandle = GLES20.glGetAttribLocation(mProgramHandle, "a_Color");
    mTextureCoordinateHandle = GLES20.glGetAttribLocation(mProgramHandle, "a_TexCoordinate");

    mPositionHandle = GLES20.glGetAttribLocation(mProgramHandle, "a_Position");

进行顶点变换的方法（瓶颈）

   private void calcPos(int index) {

    int k = 0;
    for (int i = 0; i < 18; i += 3) {

        rhsVec[0] = vertices[0 + i];
        rhsVec[1] = vertices[1 + i];
        rhsVec[2] = vertices[2 + i];
        rhsVec[3] = 1;

        // *** Step 1 : Getting to eye coordinates ***

        Matrix.multiplyMV(resultVec, 0, mModelMatrix, 0, rhsVec, 0);

        // *** Step 2 : Getting to clip coordinates ***

        float[] rhsVec2 = resultVec;

        Matrix.multiplyMV(resultVec2, 0, mProjectionMatrix, 0, rhsVec2, 0);


        // *** Step 3 : Getting to normalized device coordinates ***

        float inv_w = 1 / resultVec2[3];

        for (int j = 0; j < resultVec2.length - 1; j++) {

            resultVec2[j] = inv_w * resultVec2[j];
        }

        float xPos = (resultVec2[0] * 0.5f + 0.5f) * game_width;

        float yPos = (resultVec2[1] * 0.5f + 0.5f) * game_height;

        float zPos = (1 + resultVec2[2]) * 0.5f;

        SpriteData sD = spriteDataArrayList.get(index);

        switch (k) {

            case 0:
                sD.xPos[0] = xPos;
                sD.yPos[0] = yPos;
                break;

            case 1:
                sD.xPos[2] = xPos;
                sD.yPos[2] = yPos;
                break;

            case 2:
                sD.xPos[3] = xPos;
                sD.yPos[3] = yPos;
                break;

            case 3:
                sD.xPos[1] = xPos;
                sD.yPos[1] = yPos;
                break;
        }
        k++; 

        if (i == 3) {
            i += 9;
        }

    }

为每个精灵调用此方法 - 因此对于100个精灵，它重复100次。可能MV乘法会影响性能吗？

Answer 1

要回答主要问题，我认为不可能从GPU中获取转换后的顶点。

优化循环的第一步。首先，当它们总是产生相同的结果时，不要在循环内反复做事。在循环之外做它。特别是功能或财产电话。

接下来，您可以将2个矩阵相乘，以便使用单个矩阵乘法按顺序应用它们的变换。虽然您似乎没有将最终结果转换回屏幕空间。

您正在复制数据，然后使用该数据而不更改它。我知道矩阵乘法可能需要4个浮点数或Vec4，但你可以写一个矩阵乘法来避免复制并填充w参数。

避免您最终无法使用的计算。

缓存结果，除非更改，否则不会重新计算。

private void calcPos(int index) {

// get only once, not every loop
SpriteData sD = spriteDataArrayList.get(index);

int[] vIndices = {0, 1, 2, 5}; // the 4 verts you want

// multiply once outside the loop, use result inside loop
Matrix mvpMatrix = mModelMatrix * mProjectionMatrix; // check order

for (int i = 0; i < 4; ++i) { // only grab verts you want, no need for fancy skips

    int nVert = 3 * vIndices[i]; // 3 floats per vert

    // should avoid copying data when you aren't going to change the copy
    rhsVec[0] = vertices[0 + nVert];
    rhsVec[1] = vertices[1 + nVert];
    rhsVec[2] = vertices[2 + nVert];

    rhsVec[3] = 1; // need to write multiplyMV3 that takes pointer to 3 floats 
                   // and fills in the w param, then no need to copy

    // E.g. :
    // Matrix.multiplyMV3(resultVec2, 0, mvpMatrix, 0, &vertices[nVert], 0);

    // do both matrix multiplcations at same time
    Matrix.multiplyMV(resultVec2, 0, mvpMatrix, 0, rhsVec, 0);

    // *** Step 3 : Getting to normalized device coordinates ***
    float inv_w = 1 / resultVec2[3];

    for (int j = 0; j < 2; ++j) // just what we need
        resultVec2[j] *= inv_w;

    // Curious... Transform into projection space, just to transform 
    // back into screen space.  Perhaps you are transforming too far?
    float xPos = (resultVec2[0] * 0.5f + 0.5f) * game_width;
    float yPos = (resultVec2[1] * 0.5f + 0.5f) * game_height;
    // float zPos = (1 + resultVec2[2]) * 0.5f; // not used

    switch (i) {

        case 0:
            sD.xPos[0] = xPos;
            sD.yPos[0] = yPos;
            break;

        case 1:
            sD.xPos[2] = xPos;
            sD.yPos[2] = yPos;
            break;

        case 2:
            sD.xPos[3] = xPos;
            sD.yPos[3] = yPos;
            break;

        case 3:
            sD.xPos[1] = xPos;
            sD.yPos[1] = yPos;
            break;
    }
}

OpenGL ES：从着色器获取变换顶点

1 个答案: