我正在阅读Instagram's sharding technique,在第136页的幻灯片中,它有以下代码(我假设是Python?)从生成的ID中获取shard_id并决定查看是否可以获取Postgres中的shard_id,但是我无法做到。可能是因为我不熟悉按位操作或Python操作符与Postgres操作符的其他细微差别。
# Python code:
# pulling shard ID from ID:
shard_id = id ^ ((id >> 23) << 23)
timestamp = EPOCH + id >> 23
问题:
^
运算符之外,python和Postgres之间是否存在更多运算符差异?WITH var AS (
SELECT 1314220021721::bigint AS epoch
, 1403496968580::bigint AS ms
, (31341 % 2000)::bigint AS shard_id -- equals 1341
, (5000 % 1024)::bigint AS seq_id
), bit AS (
SELECT *
, ((ms) - epoch) << (64-41) AS ms_bit
, shard_id << (64-41-13) AS shard_bit
FROM var
), val AS (
SELECT *
, (ms_bit | shard_bit | seq_id) AS id
FROM bit
)
SELECT *
, ms_bit::bit(64) AS ms_64
, shard_bit::bit(64) AS shard_64
, seq_id::bit(64) AS seq_64
, id::bit(64) AS id_64
-- "shard_id_conv" should equal "shard_id" (**and does not**, instead it's 1374088)
-- note: '^' is changed to '#'
-- shard_id_conv = 1374088
, id # ((id >> 23) << 23) AS shard_id_conv
-- "ms_conv" should equal "ms" (and does)
, epoch + (id >> 23) AS ms_conv
-- "shard_seq" equals "shard_id_conv" (and does, but isn't the actual shard_id)
-- shard_seq = 1374088
, (shard_bit | seq_id) AS shard_seq
FROM val;
/* -- 64 BIT
0000101001100100101010010000110011010101100000000000000000000000 -- ms_bit
0000000000000000000000000000000000000000000101001111010000000000 -- shard_bit
0000000000000000000000000000000000000000000000000000001110001000 -- seq_bit
0000101001100100101010010000110011010101100101001111011110001000 -- id_bit
*/
答案 0 :(得分:1)
我 认为 Instagram搞砸了幻灯片中的公式,因为我们需要删除10个seq_id位。
请注意:#
是Postgres中的xor
运算符。 Instagram使用^
作为公式中的xor运算符。
<强>不正确:强>
id # ((id >> 23) << 23) AS shard_id
<强>正确:强>
(id # ((id >> 23) << 23)) >> 10 AS shard_id
>> 10
通过正确的位移来删除seq_id位。
如果有更好的方法可以删除在Postgres中表现更好的10个seq_id位,请回答。