Kafka的JVM客户端正在使用murmur2哈希的自定义实现作为其默认分区程序。
Go的Kafka客户端都没有实现这种散列算法,当您需要在不同平台上的不同客户端之间保持一致的分区时,这会带来各种各样的问题。
我试图将此代码移植到Go,它似乎适用于某些值,但不适用于其他值。
这是Java代码(源代码在这里:https://github.com/apache/kafka/blob/1.0.0/clients/src/main/java/org/apache/kafka/common/utils/Utils.java#L353 ):
public static int murmur2(final byte[] data) {
int length = data.length;
int seed = 0x9747b28c;
// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.
final int m = 0x5bd1e995;
final int r = 24;
// Initialize the hash to a random value
int h = seed ^ length;
int length4 = length / 4;
for (int i = 0; i < length4; i++) {
final int i4 = i * 4;
int k = (data[i4 + 0] & 0xff) + ((data[i4 + 1] & 0xff) << 8) + ((data[i4 + 2] & 0xff) << 16) + ((data[i4 + 3] & 0xff) << 24);
k *= m;
k ^= k >>> r;
k *= m;
h *= m;
h ^= k;
}
// Handle the last few bytes of the input array
switch (length % 4) {
case 3:
h ^= (data[(length & ~3) + 2] & 0xff) << 16;
case 2:
h ^= (data[(length & ~3) + 1] & 0xff) << 8;
case 1:
h ^= data[length & ~3] & 0xff;
h *= m;
}
h ^= h >>> 13;
h *= m;
h ^= h >>> 15;
return h;
}
这是Go代码(游乐场链接:https://play.golang.org/p/K4VooLZ4Mp7):
package main
import "fmt"
func main() {
cases := []struct {
Input []byte
Expected int32
}{
{[]byte("21"), -973932308},
{[]byte("foobar"), -790332482}, // outputs: 1518714010
{[]byte("a-little-bit-long-string"), -985981536}, // outputs 2068422364
{[]byte("a-little-bit-longer-string"), -1486304829}, // outputs 1797390322
{[]byte("lkjh234lh9fiuh90y23oiuhsafujhadof229phr9h19h89h8"), -58897971}, // outputs -1332218133
{[]byte{'a', 'b', 'c'}, 479470107},
}
for _, c := range cases {
if res := murmur2(c.Input); res != c.Expected {
fmt.Printf("input: %q, expected: %d, got: %d\n", c.Input, c.Expected, res)
}
}
}
func murmur2(data []byte) int32 {
length := int32(len(data))
seed := uint32(0x9747b28c)
m := int32(0x5bd1e995)
r := uint32(24)
h := int32(seed ^ uint32(length))
length4 := length / 4
for i := int32(0); i < length4; i++ {
i4 := i * 4
k := int32(data[i4+0]&0xff) + (int32(data[i4+1]&0xff) << 8) + (int32(data[i4+2]&0xff) << 16) + (int32(data[i4+3]&0xff) << 24)
k ^= int32(uint32(k) >> r)
k *= m
h *= m
h ^= k
}
switch length % 4 {
case 3:
h ^= int32(data[(length & ^3)+2]&0xff) << 16
fallthrough
case 2:
h ^= int32(data[(length & ^3)+1]&0xff) << 8
fallthrough
case 1:
h ^= int32(data[length & ^3] & 0xff)
h *= m
}
h ^= int32(uint32(h) >> 13)
h *= m
h ^= int32(uint32(h) >> 15)
return h
}
我使用提到的Utils
类生成了Java测试的期望值,如下所示:
System.out.println(Utils.murmur2("a-little-bit-long-string".getBytes("UTF-8")))
Go I所见的现有murmur2实现都没有产生与提到的Java代码相同的结果。
问题是,如何将Java中提到的代码移植到Go中,以便两者之间的结果相同?
答案 0 :(得分:1)
正如@IskanderSharipov指出的那样:
Go版本错过了一个乘法语句:循环中的k * = m