在简单查询中隐藏不需要的重复
我们的数据:
我们有256个uuid1分片
创建表格xx为( uuid1shard char(2), uuid1 char(36) 时间戳时间戳, uuid2shard char(2), uuid2(char(36), col001 .. col190 int / bigint / char, 有效载荷varchar(100) )
我们的调查结果:
工作正常:
select
row_number() over (partition by uuid2shard, uuid2 order by (timestamp, uuid2shard, uuid2))
,*
from (select * from xx where uuidshard2 < 192)
以某种方式工作(count(*)&gt; count(distinct uuid2):
select
row_number() over (partition by uuid2shard, uuid2 order by (timestamp, uuid2shard, uuid2))
,*
from (select * from xx where uuidshard2 < 192)
结果(~800M)在通过create table
保存时包含cca 8000重复行(完全相同)一切正常,只需选中 -
select count(*), count(distinct uuid2) from
(
select
row_number() over (partition by uuid2shard, uuid2 order by (timestamp, uuid2shard, uuid2))
,*
from xx
)
再次工作: - 只是忽略剩余的190列
select
row_number() over (partition by uuid2shard, uuid2 order by (timestamp, uuid2shard, uuid2))
, partition by uuid1shard, uuid1, timestamp, uuid2shard, uuid2, smallpayload
from xx
根本原因 固定(所有重复的来源是什么?):
create table yy as --the CREATE, yes, the problem is probably in the create statement
select
row_number() over (partition by uuid2shard, uuid2 order by (timestamp, uuid2shard, uuid2))
,*
from xx
看起来像'大小'问题,或者必须设置一些东西以使其正常工作
我们的环境:
小一点;) 16个节点,384个VCores等...