聚合哈希函数?

时间:2014-06-17 11:35:24

标签: sql teradata

就像我们有sum和count的聚合函数一样,有没有办法在字段上进行聚合哈希?例如,假设您有以下记录集:

Name     ID
Bob      1
Bob      2
Bob      3
Bob      4

概念,我想这样做:

select name, hash(id) from mydb.mytable
group by 1

...会返回这个:

Name     ID
Bob      D8-F0-00-91

如果我删除ID = 3的记录,则聚合将返回:

Name     ID
Bob      A8-EB-6D-1D

仅供参考,我使用select hashrow(1,2,3,4)select hashrow(1,2,4)来获取上述哈希值。

更新dnoeth:提及我需要生成的聚合是唯一的,这可能会有所帮助。这是我正在使用的数据模型的一个示例:

table office (Id integer)
table employee (Id integer, OfficeId integer)

每个办公室都有员工,因此从办公室到员工的一对多,员工表将OfficeId作为办公桌的FK。

locking row for access
select n, count(n) from
(
    select 
        officeid, 
        sum(cast(from_bytes('00'xb || hashrow(id), 'base10') as bigint)) n 
    from mydb.employee
    group by 1
) x
group by 1

这是碰撞发生的一个很好的例子。但是,我没有提到我需要结果是独一无二的。

select 
    cast(from_bytes('00'xb || hashrow(2300015), 'base10') as bigint) +
    cast(from_bytes('00'xb || hashrow(14100028), 'base10') as bigint) hash1,

    cast(from_bytes('00'xb || hashrow(1000004), 'base10') as bigint) +
    cast(from_bytes('00'xb || hashrow(3100014), 'base10') as bigint) +
    cast(from_bytes('00'xb || hashrow(12300025), 'base10') as bigint) hash2

1 个答案:

答案 0 :(得分:3)

你需要将HASHROW的结果转换为数值,然后你可以很容易地将其加总。

-- HASHROW to unsigned integer, TD14
SUM(CAST(FROM_BYTES('00'xb||HASHROW(ColumnName), 'base10') AS BIGINT)
   ) AS SumHash

-- HASHROW to unsigned integer, pre-TD14
SUM(  HASHBUCKET(       HASHROW(ColumnName)      (BYTE(4)))  / ((HASHBUCKET()+1)/65536) * CAST(65536 AS BIGINT)
    + HASHBUCKET(SUBSTR(HASHROW(ColumnName),3,2) (BYTE(4)))  / ((HASHBUCKET()+1)/65536)
   ) AS SumHash

修改

在散列时无法获得保证的唯一结果,根据计算的散列长度,概率会增加。并且HASHROW返回一个4字节的值,你只需添加它们:-(

您可以安装现有的散列UDF返回方式超过4个字节,如

https://downloads.teradata.com/download/extensibility/sha-1-message-digest-udf

https://downloads.teradata.com/download/extensibility/md5-message-digest-udf

https://github.com/akuroda/teradata-udf-sha2

然后实现聚合 XOR UDF。

Teradata的DevEx上有一篇关于比较表格的博客:

http://developer.teradata.com/blog/ulrich/2013/05/calculation-of-table-hash-values-to-compare-table-content