将MeiYan哈希函数移植到Go

时间:2017-02-28 11:41:09

标签: performance go hash

我想将一个最先进的哈希函数MeiYan从C转移到Go。 (据我所知,这是最好的,不仅仅是哈希表在速度和冲突率方面最好的哈希函数,它至少胜过MurMur。)

我是Go的新手,只花了一个周末,并提出了这个版本:

func meiyan(key *byte, count int) uint32 {
    type P *uint32;
    var h uint32 = 0x811c9dc5;
    for ;count >= 8; {
        a := ((*(*uint32)(unsafe.Pointer(key))) << 5)
        b := ((*(*uint32)(unsafe.Pointer(key))) >> 27)
        c := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 4))
        h = (h ^ ((a | b) ^ c)) * 0xad3e7
        count -= 8
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 8))
    }
    if (count & 4) != 0 {
        h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
        h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
    }
    if (count & 2) != 0 {
        h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
    }
    if (count & 1) != 0 {
        h = (h ^ uint32(*key));
        h = h * 0xad3e7
    }
    return h ^ (h >> 16);
}

看起来很乱,但我认为我不能让它看起来更好。现在我测量速度并且速度令人沮丧,比使用gccgo -O3编译时C / C ++慢3倍。这可以更快吗?这和编译器一样好,或者unsafe.Pointer转换速度和它一样慢吗?事实上,这让我感到惊讶,因为我已经看到其他一些数字运算风格代码与C一样快,甚至更快。我在这里做了什么吗?

这是我移植的原始C代码:

u32 meiyan(const char *key, int count) {
    typedef u32* P;
    u32 h = 0x811c9dc5;
    while (count >= 8) {
        h = (h ^ ((((*(P)key) << 5) | ((*(P)key) >> 27)) ^ *(P)(key + 4))) * 0xad3e7;
        count -= 8;
        key += 8;
    }
    #define tmp h = (h ^ *(u16*)key) * 0xad3e7; key += 2;
    if (count & 4) { tmp tmp }
    if (count & 2) { tmp }
    if (count & 1) { h = (h ^ *key) * 0xad3e7; }
    #undef tmp
    return h ^ (h >> 16);
}

以下是我测量速度的方法:

func main(){
    T := time.Now().UnixNano()/1e6
    buf := []byte("Hello World!")
    var controlSum uint64 = 0
    for x := 123; x < 1e8; x++ {
        controlSum += uint64(meiyan(&buf[0], 12))
    }
    fmt.Println(time.Now().UnixNano()/1e6 - T, "ms")
    fmt.Println("controlSum:", controlSum)
}

2 个答案:

答案 0 :(得分:3)

经过一些仔细研究后,我发现了为什么我的代码很慢,并且对它进行了改进,所以它现在比我测试中的C版本更快:

package main

import (
    "fmt"
    "time"
    "unsafe"
)

func meiyan(key *byte, count int) uint32 {
    type un unsafe.Pointer
    type p32 *uint32
    type p16 *uint16
    type p8 *byte
    var h uint32 = 0x811c9dc5;
    for ;count >= 8; {
        a := *p32(un(key)) << 5
        b := *p32(un(key)) >> 27
        c := *p32(un(uintptr(un(key)) + 4))
        h = (h ^ ((a | b) ^ c)) * 0xad3e7
        count -= 8
        key = p8(un(uintptr(un(key)) + 8))
    }
    if (count & 4) != 0 {
        h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
        key = p8(un(uintptr(un(key)) + 2))
        h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
        key = p8(un(uintptr(un(key)) + 2))
    }
    if (count & 2) != 0 {
        h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
        key = p8(un(uintptr(un(key)) + 2))
    }
    if (count & 1) != 0 {
        h = h ^ uint32(*key)
        h = h * 0xad3e7
    }
    return h ^ (h >> 16);
}

func main() {
    T := time.Now().UnixNano()/1e6
    buf := []byte("ABCDEFGHABCDEFGH")
    var controlSum uint64 = 0
    start := &buf[0]
    size := len(buf)
    for x := 123; x < 1e8; x++ {
        controlSum += uint64(meiyan(start, size))
    }
    fmt.Println(time.Now().UnixNano()/1e6 - T, "ms")
    fmt.Println("controlSum:", controlSum)
}

哈希函数本身已经很快,但是在每次迭代时取消引用数组都会使它变慢:&buf[0]start := &buf[0]替换,然后在每次迭代时使用start。 / p>

答案 1 :(得分:1)

来自NATS的实施看起来令人印象深刻!在我的机器上,数据长度为30(字节)操作/秒157175656.56 纳秒/操作6.36 !看看吧。你可能会发现一些想法。