计算Redshift列

时间:2017-07-26 14:05:22

标签: binary bit-manipulation aggregate-functions amazon-redshift bitcount

我的Redshift表中有BIGINT列,我想要一个查询:

  1. 计算此列所有行中二进制值中每个位位置的值“1”出现的次数
  2. 将以我能够接受 x top bits_positions的方式显示它。
  3. 例如(我已经将整数值写为二进制以简化示例):

    column
    --------
    11011110  = 222
    00000000  = 0
    11111100  = 252
    00011000  = 24
    11111100  = 252
    00011000  = 24
    11000010  = 194
    
    76543210 <- bit_position
    

    将返回如下表格:

    bit_position   count
    0              0
    1              2
    2              3
    3              5
    4              5
    5              2
    6              4
    7              4
    

    在这种情况下,我将能够获得前五位bit_position:(3,4,6,7,2)

    注意:我可能最多有64位bit_positions。

1 个答案:

答案 0 :(得分:3)

您可以使用逐位AND &来检查每个位置。

以下是跨行的示例:

SELECT SUM(CASE WHEN bit_col & 64 > 0 THEN 1 ELSE 0 END) "1000000"
     , SUM(CASE WHEN bit_col & 32 > 0 THEN 1 ELSE 0 END) "0100000"
     , SUM(CASE WHEN bit_col & 16 > 0 THEN 1 ELSE 0 END) "0010000"
     , SUM(CASE WHEN bit_col & 8 > 0 THEN 1 ELSE 0 END)  "0001000"
     , SUM(CASE WHEN bit_col & 4 > 0 THEN 1 ELSE 0 END)  "0000100"
     , SUM(CASE WHEN bit_col & 2 > 0 THEN 1 ELSE 0 END)  "0000010"
     , SUM(CASE WHEN bit_col & 1 > 0 THEN 1 ELSE 0 END)  "0000001"
FROM my_table
;
 1000000 | 0100000 | 0010000 | 0001000 | 0000100 | 0000010 | 0000001
---------+---------+---------+---------+---------+---------+---------
      11 |       8 |      11 |      13 |      11 |       9 |       8

要将结果放在一个列中,您需要使用union:

          SELECT 1 AS "col", SUM(CASE WHEN bit_col & 64 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 2 AS "col", SUM(CASE WHEN bit_col & 32 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 3 AS "col", SUM(CASE WHEN bit_col & 16 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 4 AS "col", SUM(CASE WHEN bit_col &  8 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 5 AS "col", SUM(CASE WHEN bit_col &  4 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 6 AS "col", SUM(CASE WHEN bit_col &  2 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 7 AS "col", SUM(CASE WHEN bit_col &  1 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
ORDER BY bit_count DESC
;
 position | bit_count
----------+-----------
        6 |         6
        7 |         6
        4 |         4
        5 |         4
        2 |         0
        3 |         0
        1 |         0

http://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html

编辑:如果你想要更动态的东西,你需要研究使用UDF。您可以从我的f_bitwise_to_string UDF作为模板开始,并从那里添加您需要的内容。 https://github.com/awslabs/amazon-redshift-udfs/blob/master/scalar-udfs/f_bitwise_to_string.sql