Question

我目前正在使用GoLang制作游戏。我在测量FPS。我注意到使用for循环附加到切片的7 fps丢失如下：

vertexInfo := Opengl.OpenGLVertexInfo{}

for i := 0; i < 4; i = i + 1 {
    vertexInfo.Translations = append(vertexInfo.Translations, float32(s.x), float32(s.y), 0)
    vertexInfo.Rotations = append(vertexInfo.Rotations, 0, 0, 1, s.rot)
    vertexInfo.Scales = append(vertexInfo.Scales, s.xS, s.yS, 0)
    vertexInfo.Colors = append(vertexInfo.Colors, s.r, s.g, s.b, s.a)

}

我正在为每个精灵，每次抽奖做这个。问题是为什么我只是循环一次并将相同的东西添加到这些切片中，从而获得如此巨大的性能损失？有没有更有效的方法来做到这一点？这并不像我正在添加大量数据。每个切片包含大约16个元素，如上所示（4 x 4）。

当我简单地将所有16个元素放在一个[]float32{1..16}中时，fps会提高大约4个。

更新：我对每个追加进行了基准测试，看起来每个追踪都需要1个fps。考虑到这些数据非常静态，这似乎很多..我只需要4次迭代......

更新：添加了github repo https://github.com/Triangle345/GT

Answer 1

如果目标切片的容量小于追加后切片的长度，则内置append()需要创建新的后备数组。这还需要将当前元素从目标复制到新分配的数组，因此开销很大。

您附加的切片很可能是空切片，因为您使用切片文字来创建Opengl.OpenGLVertexInfo值。即使append()考虑未来并分配一个比附加指定元素所需的数组更大的数组，但在您的情况下，可能需要多次重新分配才能完成4次迭代。

如果您创建并初始化vertexInfo，则可以避免重新分配：

vertexInfo := Opengl.OpenGLVertexInfo{
    Translations: []float32{float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0},
    Rotations:    []float64{0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot},
    Scales:       []float64{s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0},
    Colors:       []float64{s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a},
}

另请注意，此struct literal将负责不必在切片后面重新分配数组。但是，如果在代码的其他位置（我们没有看到），您将更多元素附加到这些切片，它们可能会导致重新分配。如果是这种情况，您应该创建具有更大容量的切片，涵盖“未来”分配（例如make([]float64, 16, 32)）。

Answer 2

空切片为空。要追加，它必须分配内存。然后你做更多的追加，必须分配更多的内存。

要加快速度，请使用固定大小的数组或使用make创建具有正确长度的切片，或者在声明时使用项目初始化切片。

附加不好的表现..为什么？

2 个答案: