就像我们有sum和count的聚合函数一样,有没有办法在字段上进行聚合哈希?例如,假设您有以下记录集:
Name ID
Bob 1
Bob 2
Bob 3
Bob 4
概念,我想这样做:
select name, hash(id) from mydb.mytable
group by 1
...会返回这个:
Name ID
Bob D8-F0-00-91
如果我删除ID = 3的记录,则聚合将返回:
Name ID
Bob A8-EB-6D-1D
仅供参考,我使用select hashrow(1,2,3,4)
和select hashrow(1,2,4)
来获取上述哈希值。
更新dnoeth:提及我需要生成的聚合是唯一的,这可能会有所帮助。这是我正在使用的数据模型的一个示例:
table office (Id integer)
table employee (Id integer, OfficeId integer)
每个办公室都有员工,因此从办公室到员工的一对多,员工表将OfficeId作为办公桌的FK。
locking row for access
select n, count(n) from
(
select
officeid,
sum(cast(from_bytes('00'xb || hashrow(id), 'base10') as bigint)) n
from mydb.employee
group by 1
) x
group by 1
这是碰撞发生的一个很好的例子。但是,我没有提到我需要结果是独一无二的。
select
cast(from_bytes('00'xb || hashrow(2300015), 'base10') as bigint) +
cast(from_bytes('00'xb || hashrow(14100028), 'base10') as bigint) hash1,
cast(from_bytes('00'xb || hashrow(1000004), 'base10') as bigint) +
cast(from_bytes('00'xb || hashrow(3100014), 'base10') as bigint) +
cast(from_bytes('00'xb || hashrow(12300025), 'base10') as bigint) hash2
答案 0 :(得分:3)
你需要将HASHROW的结果转换为数值,然后你可以很容易地将其加总。
-- HASHROW to unsigned integer, TD14
SUM(CAST(FROM_BYTES('00'xb||HASHROW(ColumnName), 'base10') AS BIGINT)
) AS SumHash
-- HASHROW to unsigned integer, pre-TD14
SUM( HASHBUCKET( HASHROW(ColumnName) (BYTE(4))) / ((HASHBUCKET()+1)/65536) * CAST(65536 AS BIGINT)
+ HASHBUCKET(SUBSTR(HASHROW(ColumnName),3,2) (BYTE(4))) / ((HASHBUCKET()+1)/65536)
) AS SumHash
修改强>
在散列时无法获得保证的唯一结果,根据计算的散列长度,概率会增加。并且HASHROW返回一个4字节的值,你只需添加它们:-(
您可以安装现有的散列UDF返回方式超过4个字节,如
https://downloads.teradata.com/download/extensibility/sha-1-message-digest-udf
https://downloads.teradata.com/download/extensibility/md5-message-digest-udf
https://github.com/akuroda/teradata-udf-sha2
然后实现聚合 XOR UDF。
Teradata的DevEx上有一篇关于比较表格的博客: