高效精灵渲染

Question

我正在编写一些使用D3D11（SharpDX for WinRT应用程序）绘制2D精灵的代码。所有这些都有效，但速度并不快。这是我正在使用的一些代码：

        // Setup local variables
        var d3dDevice = App.deviceManager.DeviceDirect3D;
        var d3dContext = App.deviceManager.ContextDirect3D;

        var vertices = SharpDX.Direct3D11.Buffer.Create(App.deviceManager.DeviceDirect3D, BindFlags.VertexBuffer, new[]
        {
            // Position                                  Colour                      UV
            x0 - OFFSET, y0 - OFFSET, 0.0f, 1.0f ,       1.0f, 1.0f, 1.0f, 1.0f,     u, v,
            x1 - OFFSET, y1 - OFFSET, 0.0f, 1.0f ,       1.0f, 1.0f, 1.0f, 1.0f,     u2, v,
            x3 - OFFSET, y3 - OFFSET, 0.0f, 1.0f ,       1.0f, 1.0f, 1.0f, 1.0f,     u, v2,
            x2 - OFFSET, y2 - OFFSET, 0.0f, 1.0f ,       1.0f, 1.0f, 1.0f, 1.0f,     u2, v2,
        });

        textureView = new ShaderResourceView( App.deviceManager.DeviceDirect3D, texture );

        // Setup the pipeline
        d3dContext.InputAssembler.SetVertexBuffers( 0, vertexBufferBinding );
        d3dContext.InputAssembler.InputLayout = layout;
        d3dContext.InputAssembler.PrimitiveTopology = PrimitiveTopology.TriangleStrip;
        d3dContext.VertexShader.SetConstantBuffer( 0, constantBuffer );
        d3dContext.VertexShader.Set( vertexShader );
        d3dContext.PixelShader.Set( pixelShader );

        if( textureView != null )
            d3dContext.PixelShader.SetShaderResource( 0, textureView );

        // Draw the quad
        d3dContext.Draw( 4, 0 );

为屏幕上呈现的每个精灵调用此代码。我无法保证每个精灵的生命。它们可以在下一帧或不同位置处于相同位置。可能会有更多，可能会更少。根据我的分析器，SharpDX.Direct3D11.Buffer.Create在渲染大量精灵时占据了整个帧的50％。我认为有一个更好的方法可以做到这一点，但我正在努力让我的头脑周围做什么。任何人都可以为此提出任何建议吗？

Answer 1

绘制调用，创建资源和更改图形设备的状态一直是（并且从我的经验仍然是）非常慢。 改变状态是什么意思？

设置着色器
绑定缓冲区
更改纹理
等

如果您想要一个快速执行的Direct3D应用程序，请将您将设备绑定到设备的时间保持在最低限度。

高效精灵渲染

根本不需要为每个精灵调用此代码。事实上，有一些方法可以绘制成千上万的精灵，同时保持良好的帧速率，并且除了它之外还可以完成任务（物理，游戏，渲染3D世界）。有一个通用规则，通常也可以应用于所有其他渲染任务：

使用相同的绘制调用尽可能多地绘制对象。

因为这意味着您只为类似对象设置着色器，缓冲区，输入布局等。精灵对此非常完美，因为不同的精灵有很多共同之处。

什么是精灵？

精灵可以被描述为纹理的矩形部分，该部分被绘制到屏幕上的矩形部分。这是我在我的应用程序中使用的Sprite类的精简示例：

public struct Sprite
{
    public Color4 Color { get; set; }
    public Texture2D Texture { get; set; }
    public Rectangle Source { get; set; }
    public Rectangle Destination { get; set; }
    public float Depth { get; set; }
    public FlipMode Flip { get; set; }
}

Color属性用作混合颜色，或者在没有纹理时用作纯色背景。 Source是纹理上的矩形区域。 Destination是屏幕上绘制精灵的区域。根据您的要求，矩形也可以旋转。我有两种精灵，一种可以旋转，另一种不能旋转。

绘制精灵

以下是一些（未分类的）提示和想法如何加快渲染速度：

按纹理对精灵进行排序

只需同时绘制具有相同纹理的精灵。这导致另一个重要方面：在精灵映射中存储精灵图像。将所有微小纹理组合成一个较大的纹理，并在同一个绘制调用中一次性绘制它们。

剔除你的精灵

在绘制精灵测试之前，可见性。

精灵甚至在屏幕上？
透明吗？
被其他精灵遮挡了吗？

要启用深度剔除，还要先按深度排序，然后先排序。

单顶点缓冲区

你基本上为每个精灵创建一个四边形网格。然后将顶点全部写入相同的顶点缓冲区并绘制它们。

这是上面Sprite类的顶点：

public struct SpriteVertex
{
    public Vector3 Position { get; set; }
    public Vector2 Texcoord { get; set; }
    public Vector4 Color { get; set; }
}

明显的缺点是，所有矢量数学都是在CPU中完成的。这是一种耻辱，因为你可以在GPU上更快地做同样的事情。

硬件实例化

您创建了两个顶点缓冲区：

第一个包含形成四边形的4个顶点，每个顶点具有2D位置和纹理坐标。
第二个缓冲区包含所有精灵数据，每个结构代表一个精灵。

第一个缓冲区简单地绑定为主顶点缓冲区，而第二个缓冲区用作实例缓冲区。

这是我为此方法编写的顶点着色器：

struct Vertex
{
    float2 Position     : POSITION0;
    float2 Texcoord     : TEXCOORD0;
};

struct Sprite
{
    float4 Color        : COLOR0;
    float4 DestinationA : TEXCOORD1;
    float4 DestinationB : TEXCOORD2;
    float4 SourceA      : TEXCOORD3;
    float4 SourceB      : TEXCOORD4;
    float4 Other        : TEXCOORD5;
};

struct Pixel
{
    float4 Position     : SV_POSITION;
    float2 Texcoord     : TEXCOORD0;
    float4 Color        : COLOR0;
};

cbuffer Screen      : register(b0)
{
    float2 ScreenSize;
};

cbuffer Texture     : register(b1)
{
    float2 TextureSize;
};

float2 Transform(float2 position, float2 scale, float2 origin, float angle, float2 translation)
{
    // Create a 2D rotation matrix
    float2x2 rotation = float2x2(cos(angle), sin(angle), -sin(angle), cos(angle));

    // Apply the scale, origin, rotation and translation
    return mul(position * scale - origin, rotation) + translation;
}

Pixel Main(Vertex vertex, Sprite sprite)
{
    // Transform the vertex position
    float2 position = Transform(
        vertex.Position, 
        sprite.DestinationA.zw,
        sprite.DestinationB.xy,
        sprite.DestinationB.z,
        sprite.DestinationA.xy
    );

    // Bring to screen space
    position.x = position.x / ScreenSize.x * 2 - 1;
    position.y = 1 - position.y / ScreenSize.y * 2;

    float2 texcoord = vertex.Texcoord;

    // Flip the Y axis for the texture coordinates before the transformation
    texcoord.y = 1 - texcoord.y;

    // Transform the texcoords
    texcoord = Transform(
        texcoord, 
        sprite.SourceA.zw,
        sprite.SourceB.xy,
        sprite.SourceB.z,
        sprite.SourceA.xy
    );

    // Bring to texture space
    texcoord.x = texcoord.x / TextureSize.x;
    texcoord.y = texcoord.y / TextureSize.y;

    // Flip the texture coordinates
    float texcoordFlip = sprite.Other.z;
    if (texcoordFlip == 1 || texcoordFlip == 3) texcoord.x = 1 - texcoord.x;
    if (texcoordFlip == 2 || texcoordFlip == 3) texcoord.y = 1 - texcoord.y;

    // Create the output struct
    Pixel pixel = (Pixel)0;
    pixel.Position = float4(position, sprite.Other.x, 1);
    pixel.Texcoord = texcoord;
    pixel.Color = float4(sprite.Color);

    return pixel;
}

几何着色器

另一种方法涉及几何着色器阶段。就像在Instancing方法中一样，您可以创建一个包含精灵结构的顶点缓冲区。但是这次将它绑定为主顶点缓冲区（不需要四边形）并将拓扑设置为Point。

在几何着色器中，然后从精灵信息中生成四个顶点。

我曾经测试过这种方法，但很快又回到了硬件实例室。据我所知，它没有给出很好的性能提升，但我可能错过了一些东西。

祝你好运！

DirectX11中的最佳顶点缓冲区处理

1 个答案:

高效精灵渲染

什么是精灵？

绘制精灵

按纹理对精灵进行排序

剔除你的精灵

单顶点缓冲区

硬件实例化

几何着色器