将卡夫卡的murmur2实施移植到Go

时间:2018-02-02 12:21:15

标签: java go hash encoding apache-kafka

Kafka的JVM客户端正在使用murmur2哈希的自定义实现作为其默认分区程序。

Go的Kafka客户端都没有实现这种散列算法,当您需要在不同平台上的不同客户端之间保持一致的分区时,这会带来各种各样的问题。

我试图将此代码移植到Go,它似乎适用于某些值,但不适用于其他值。

这是Java代码(源代码在这里:https://github.com/apache/kafka/blob/1.0.0/clients/src/main/java/org/apache/kafka/common/utils/Utils.java#L353 ):

public static int murmur2(final byte[] data) {
    int length = data.length;
    int seed = 0x9747b28c;
    // 'm' and 'r' are mixing constants generated offline.
    // They're not really 'magic', they just happen to work well.
    final int m = 0x5bd1e995;
    final int r = 24;

    // Initialize the hash to a random value
    int h = seed ^ length;
    int length4 = length / 4;

    for (int i = 0; i < length4; i++) {
        final int i4 = i * 4;
        int k = (data[i4 + 0] & 0xff) + ((data[i4 + 1] & 0xff) << 8) + ((data[i4 + 2] & 0xff) << 16) + ((data[i4 + 3] & 0xff) << 24);
        k *= m;
        k ^= k >>> r;
        k *= m;
        h *= m;
        h ^= k;
    }

    // Handle the last few bytes of the input array
    switch (length % 4) {
        case 3:
            h ^= (data[(length & ~3) + 2] & 0xff) << 16;
        case 2:
            h ^= (data[(length & ~3) + 1] & 0xff) << 8;
        case 1:
            h ^= data[length & ~3] & 0xff;
            h *= m;
    }

    h ^= h >>> 13;
    h *= m;
    h ^= h >>> 15;

    return h;
}

这是Go代码(游乐场链接:https://play.golang.org/p/K4VooLZ4Mp7):

package main

import "fmt"

func main() {
    cases := []struct {
        Input    []byte
        Expected int32
    }{
        {[]byte("21"), -973932308},
        {[]byte("foobar"), -790332482}, // outputs: 1518714010
        {[]byte("a-little-bit-long-string"), -985981536}, // outputs 2068422364
        {[]byte("a-little-bit-longer-string"), -1486304829}, // outputs 1797390322
        {[]byte("lkjh234lh9fiuh90y23oiuhsafujhadof229phr9h19h89h8"), -58897971}, // outputs -1332218133
        {[]byte{'a', 'b', 'c'}, 479470107},
    }

    for _, c := range cases {
        if res := murmur2(c.Input); res != c.Expected {
            fmt.Printf("input: %q, expected: %d, got: %d\n", c.Input, c.Expected, res)
        }
    }
}

func murmur2(data []byte) int32 {
    length := int32(len(data))
    seed := uint32(0x9747b28c)
    m := int32(0x5bd1e995)
    r := uint32(24)

    h := int32(seed ^ uint32(length))
    length4 := length / 4

    for i := int32(0); i < length4; i++ {
        i4 := i * 4
        k := int32(data[i4+0]&0xff) + (int32(data[i4+1]&0xff) << 8) + (int32(data[i4+2]&0xff) << 16) + (int32(data[i4+3]&0xff) << 24)
        k ^= int32(uint32(k) >> r)
        k *= m
        h *= m
        h ^= k
    }

    switch length % 4 {
    case 3:
        h ^= int32(data[(length & ^3)+2]&0xff) << 16
        fallthrough
    case 2:
        h ^= int32(data[(length & ^3)+1]&0xff) << 8
        fallthrough
    case 1:
        h ^= int32(data[length & ^3] & 0xff)
        h *= m
    }

    h ^= int32(uint32(h) >> 13)
    h *= m
    h ^= int32(uint32(h) >> 15)

    return h
}

我使用提到的Utils类生成了Java测试的期望值,如下所示:

System.out.println(Utils.murmur2("a-little-bit-long-string".getBytes("UTF-8")))

Go I所见的现有murmur2实现都没有产生与提到的Java代码相同的结果。

问题是,如何将Java中提到的代码移植到Go中,以便两者之间的结果相同?

1 个答案:

答案 0 :(得分:1)

正如@IskanderSharipov指出的那样:

  

Go版本错过了一个乘法语句:循环中的k * = m