逐位运算性能之谜

时间:2018-08-22 18:08:01

标签: performance go

在对从字节数组到uint32的转换性能进行基准测试时,我注意到从最低有效位开始转换的速度更快:

package blah

import (
    "testing"
    "encoding/binary"
    "bytes"
)

func BenchmarkByteConversion(t *testing.B) {
    var i uint32 = 3419234848
    buf := new(bytes.Buffer)
    _ = binary.Write(buf, binary.BigEndian, i)
    b := buf.Bytes()

    for n := 0; n < t.N; n++ {
        // Start with least significant bit: 0.27 nanos
        value := uint32(b[3]) | uint32(b[2])<<8 | uint32(b[2])<<16 | uint32(b[0])<<24

        // Start with most significant bit: 0.68 nanos
        // value := uint32(b[0])<<24 | uint32(b[1])<<16 | uint32(b[2])<<8 | uint32(b[3])
        _ = value
    }
}

当我运行go test -bench=.时,以第一种方式计算value时,每次迭代获得0.27纳米,而以第二种方式计算value时,则每次迭代获得0.68纳米。 |将数字加在一起时为什么从最低有效位开始要快两倍?

1 个答案:

答案 0 :(得分:-1)

没有什么神秘之处。优化!

error

输出:

package blah

import (
    "bytes"
    "encoding/binary"
    "testing"
)

func BenchmarkByteConversionLeast(t *testing.B) {
    var i uint32 = 3419234848
    buf := new(bytes.Buffer)
    _ = binary.Write(buf, binary.BigEndian, i)
    b := buf.Bytes()

    for n := 0; n < t.N; n++ {
        // Start with least significant bit: 0.27 nanos
        value := uint32(b[3]) | uint32(b[2])<<8 | uint32(b[2])<<16 | uint32(b[0])<<24
        _ = value
    }
}

func BenchmarkByteConversionMost(t *testing.B) {
    var i uint32 = 3419234848
    buf := new(bytes.Buffer)
    _ = binary.Write(buf, binary.BigEndian, i)
    b := buf.Bytes()

    for n := 0; n < t.N; n++ {
        // Start with most significant bit: 0.68 nanos
        value := uint32(b[0])<<24 | uint32(b[1])<<16 | uint32(b[2])<<8 | uint32(b[3])
        _ = value
    }
}

应该很明显。消除边界检查。


只需使用常识即可。如果检查索引3、2、1和0的数组边界,则可以在3处停止检查,因为显然2、1、0也将有效。如果检查索引为0、1、2和3的数组边界,则必须检查所有边界。一界限检查与四界限检查。

Wikipedia: Bounds checking

Wikipedia: Bounds-checking elimination


您还应该阅读良好的代码,例如Go标准库。例如,

go test silly_test.go -bench=.
goos: linux
goarch: amd64
BenchmarkByteConversionLeast-4      2000000000           0.72 ns/op
BenchmarkByteConversionMost-4       2000000000           1.80 ns/op