Question

我做了一些实验，看起来redis哈希几乎总是比字符串更节省空间，即使哈希包含单个字段！

import redis
rc = redis.Redis(host='127.0.0.1',port=1234)

rc.flushdb()
pipe = rc.pipeline()
for i in range(1000000):
    pipe.set('stream:'+str(i)+':target',True)
f = pipe.execute()    
# uses 236 MB

rc.flushdb()
pipe = rc.pipeline()
for i in range(1000000):
    pipe.hset('stream:'+str(i),'target',True)
f = pipe.execute()
# uses 170 MB

rc.flushdb()
pipe = rc.pipeline()
for i in range(500000):
    pipe.set('stream:'+str(i)+':target',True)
    pipe.set('stream:'+str(i)+':ws',True)
f = pipe.execute()
# uses 238 MB

rc.flushdb()
pipe = rc.pipeline()
for i in range(500000):
    pipe.hset('stream:'+str(i),':target',True)
    pipe.hset('stream:'+str(i),':ws',True)
f = pipe.execute()
# uses 113 MB

哈希和字符串都有O（1）摊销的写入/读取成本。如果我不需要使用像APPEND，GETBIT，SETBIT，RANGE等最花哨的操作并且只使用纯粹的SET / GET语义，那么哈希总是不会更优选吗？我有什么疯狂的东西吗？此外，我很想知道为什么哈希的空间效率更高。

Answer 1

这篇Memory Optimization文章讨论了您提出的一些问题。

如果您可以代表您的数据，redis的建议是使用哈希值。 “小散列在很小的空间内编码，因此您应该尝试每次使用散列来表示数据”。

如果redis可以将哈希打包到一个数组中并且仍然在O（1）处查询，则会通过redis将哈希值视为小的，并进行分摊。将数据放在连续的内存区域中也有助于提高性能，特别是如果在盒子上的缓存行的帧内读取阵列的相邻元素。

在redis配置中，您可以找到以下设置，

# Hashes are encoded using a memory efficient data structure when they have a
# small number of entries, and the biggest entry does not exceed a given
# threshold. These thresholds can be configured using the following directives.
hash-max-ziplist-entries 512
hash-max-ziplist-value 64

您需要在上述阈值的两侧重复测试。总的来说，最好让您的测试尽可能地模仿您的真实数据和访问模式。您的问题的前提是应始终使用哈希值，但请记住，您依靠优化启发式算法，这对您作为用户来说并不完全透明。

redis哈希与字符串内存成本

1 个答案: