使用Postgres中的按位运算符获取Shard ID

时间:2014-06-23 06:25:30

标签: sql postgresql instagram sharding

我正在阅读Instagram's sharding technique,在第136页的幻灯片中,它有以下代码(我假设是Python?)从生成的ID中获取shard_id并决定查看是否可以获取Postgres中的shard_id,但是我无法做到。可能是因为我不熟悉按位操作或Python操作符与Postgres操作符的其他细微差别。

# Python code:
# pulling shard ID from ID:
shard_id = id ^ ((id >> 23) << 23)
timestamp = EPOCH + id >> 23

问题:

  1. 除了影响代码的^运算符之外,python和Postgres之间是否存在更多运算符差异?
  2. 获取shard_id比Instagram代码片段更多吗?我想到的是问题,因为seq_id也需要删除? UPDATE:看起来像是(shard_bit | seq_id)= shard_id_conv。
  3. 在Postgres中获取shard_id的正确方法是什么?
  4. WITH var AS (
    SELECT 1314220021721::bigint AS epoch
        , 1403496968580::bigint AS ms
        , (31341 % 2000)::bigint AS shard_id -- equals 1341
        , (5000 % 1024)::bigint AS seq_id
    ), bit AS (
    SELECT *
        , ((ms) - epoch) << (64-41) AS ms_bit
        , shard_id << (64-41-13) AS shard_bit
    FROM var
    ), val AS (
    SELECT *
        , (ms_bit | shard_bit | seq_id) AS id
    FROM bit
    )
    SELECT *
        , ms_bit::bit(64) AS ms_64
        , shard_bit::bit(64) AS shard_64
        , seq_id::bit(64) AS seq_64
        , id::bit(64) AS id_64
    
        -- "shard_id_conv" should equal "shard_id" (**and does not**, instead it's 1374088)
        -- note: '^' is changed to '#'
        -- shard_id_conv = 1374088
        , id # ((id >> 23) << 23) AS shard_id_conv 
    
        -- "ms_conv" should equal "ms" (and does)
        , epoch + (id >> 23) AS ms_conv
    
        -- "shard_seq" equals "shard_id_conv" (and does, but isn't the actual shard_id)
        -- shard_seq = 1374088
        , (shard_bit | seq_id) AS shard_seq
    FROM val;
    /* -- 64 BIT
    0000101001100100101010010000110011010101100000000000000000000000 -- ms_bit
    0000000000000000000000000000000000000000000101001111010000000000 -- shard_bit
    0000000000000000000000000000000000000000000000000000001110001000 -- seq_bit
    0000101001100100101010010000110011010101100101001111011110001000 -- id_bit
    */
    

1 个答案:

答案 0 :(得分:1)

认为 Instagram搞砸了幻灯片中的公式,因为我们需要删除10个seq_id位。

请注意:#是Postgres中的xor运算符。 Instagram使用^作为公式中的xor运算符。

<强>不正确:

id # ((id >> 23) << 23) AS shard_id

<强>正确:

(id # ((id >> 23) << 23)) >> 10 AS shard_id

>> 10通过正确的位移来删除seq_id位。

如果有更好的方法可以删除在Postgres中表现更好的10个seq_id位,请回答。