应用错误收集

Redis SETBIT，GETBIT，BITCOUNT的用例？

时间：2015-05-15 21:39:44

标签： bitmap redis

阅读Can someone explain redis setbit command?

后

和http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/（在redis文档中引用）

我仍在努力确定使用SETBIT而不是SET的用例。上述来源似乎引用了使用SETBIT存储事件和“可数”二进制数据集的驱动因素，因为它有助于显着减少您需要存储的数据量，同时仍然保留了易用性访问。

通过位图100000001中的userID（通过0的偏移标识）存储对网站的每日唯一身份访问次数 - 其中ID 0和8的用户是只有访问者 - 比设置时间戳更好：userID？请解释。谢谢。

我为此道歉显然是一个新手问题。

2 个答案:

答案 0 :(得分：2)

位是计算机使用的基本数据单元，Redis的BIT *命令允许您轻松操作位值。在OP提供的示例中，比特流的使用将主要节省空间。

为每次登录保留一个密钥将花费（至少）密钥和值的大小，总计大约10个字节，而比特流只需要每个用户1位。

答案 1 :(得分：2)

The answer is: it depends. In the above usecase it depends for example on how many logins you have per day (how many bits are active in the bitmask). If you have for example 2 logins or random user ids, it might be better to just store an LIST of logins.

But if you are having an active userbase and 60% of all users are active.. it turns out that having to store 1 bit (actually its less than that on average, because redis only stores the bitmask until the heighest set bit (1) is reached) is much more memory-friendly than storing IDs in a list. Storing IDs in a list will result in the use of e.g. 32 bits (integer) to represent a 1-bit information, which is wastefull. It might be even more if the list is using some tree concept with explicit pointers to related nodes. Due to the fact that we RAM is kinda expensive/limited and we want things to be scalable aswell, one should aim for minimal memory usage while still metting all query requirements.

So this is something I would decide from use case to use case.

However, using bitmasks allows for very fast bulk fitering of huge datasets. Let's say you store 2 bitmasks: 1 is loggedInToday, 1 is signedUpForNewsletter. By using an bitoperation like AND (processors can do those operations really fast), you can suddenly filter out all user ids (represented by the bitposition's of the 1's) that have both logged in today and signed up for the newsletters. Because intersections of a bitmasks can be done by atleast one magnitude faster than those of two ordered lists of id's, you can suddenly do this operation on millions of users and still stay below 50ms.

To wrap up my answer: the usage of bitmasks allows for some realtime analytics that would otherwise not-be-realtime and can save you a lot of memory IF you are expecting many items in a list. Note that this is just one usage, there are many others (like bloom filters).