我想将一个最先进的哈希函数MeiYan从C转移到Go。 (据我所知,这是最好的,不仅仅是哈希表在速度和冲突率方面最好的哈希函数,它至少胜过MurMur。)
我是Go的新手,只花了一个周末,并提出了这个版本:
func meiyan(key *byte, count int) uint32 {
type P *uint32;
var h uint32 = 0x811c9dc5;
for ;count >= 8; {
a := ((*(*uint32)(unsafe.Pointer(key))) << 5)
b := ((*(*uint32)(unsafe.Pointer(key))) >> 27)
c := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 4))
h = (h ^ ((a | b) ^ c)) * 0xad3e7
count -= 8
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 8))
}
if (count & 4) != 0 {
h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
}
if (count & 2) != 0 {
h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
}
if (count & 1) != 0 {
h = (h ^ uint32(*key));
h = h * 0xad3e7
}
return h ^ (h >> 16);
}
看起来很乱,但我认为我不能让它看起来更好。现在我测量速度并且速度令人沮丧,比使用gccgo -O3
编译时C / C ++慢3倍。这可以更快吗?这和编译器一样好,或者unsafe.Pointer
转换速度和它一样慢吗?事实上,这让我感到惊讶,因为我已经看到其他一些数字运算风格代码与C一样快,甚至更快。我在这里做了什么吗?
这是我移植的原始C代码:
u32 meiyan(const char *key, int count) {
typedef u32* P;
u32 h = 0x811c9dc5;
while (count >= 8) {
h = (h ^ ((((*(P)key) << 5) | ((*(P)key) >> 27)) ^ *(P)(key + 4))) * 0xad3e7;
count -= 8;
key += 8;
}
#define tmp h = (h ^ *(u16*)key) * 0xad3e7; key += 2;
if (count & 4) { tmp tmp }
if (count & 2) { tmp }
if (count & 1) { h = (h ^ *key) * 0xad3e7; }
#undef tmp
return h ^ (h >> 16);
}
以下是我测量速度的方法:
func main(){
T := time.Now().UnixNano()/1e6
buf := []byte("Hello World!")
var controlSum uint64 = 0
for x := 123; x < 1e8; x++ {
controlSum += uint64(meiyan(&buf[0], 12))
}
fmt.Println(time.Now().UnixNano()/1e6 - T, "ms")
fmt.Println("controlSum:", controlSum)
}
答案 0 :(得分:3)
经过一些仔细研究后,我发现了为什么我的代码很慢,并且对它进行了改进,所以它现在比我测试中的C版本更快:
package main
import (
"fmt"
"time"
"unsafe"
)
func meiyan(key *byte, count int) uint32 {
type un unsafe.Pointer
type p32 *uint32
type p16 *uint16
type p8 *byte
var h uint32 = 0x811c9dc5;
for ;count >= 8; {
a := *p32(un(key)) << 5
b := *p32(un(key)) >> 27
c := *p32(un(uintptr(un(key)) + 4))
h = (h ^ ((a | b) ^ c)) * 0xad3e7
count -= 8
key = p8(un(uintptr(un(key)) + 8))
}
if (count & 4) != 0 {
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
}
if (count & 2) != 0 {
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
}
if (count & 1) != 0 {
h = h ^ uint32(*key)
h = h * 0xad3e7
}
return h ^ (h >> 16);
}
func main() {
T := time.Now().UnixNano()/1e6
buf := []byte("ABCDEFGHABCDEFGH")
var controlSum uint64 = 0
start := &buf[0]
size := len(buf)
for x := 123; x < 1e8; x++ {
controlSum += uint64(meiyan(start, size))
}
fmt.Println(time.Now().UnixNano()/1e6 - T, "ms")
fmt.Println("controlSum:", controlSum)
}
哈希函数本身已经很快,但是在每次迭代时取消引用数组都会使它变慢:&buf[0]
被start := &buf[0]
替换,然后在每次迭代时使用start
。 / p>
答案 1 :(得分:1)
来自NATS的实施看起来令人印象深刻!在我的机器上,数据长度为30(字节)操作/秒157175656.56 和纳秒/操作6.36 !看看吧。你可能会发现一些想法。