我有2个相似的表格
TBL-1
-----
Userid, score
TBL-2
-----
Userid, score
每个基于不同算法的得分,我需要从两个也不相交的数据集中创建一个具有相等数量记录的数据集,这样做的有效(执行时间)方法是什么?
编辑:1:一个重要的指针,两个表(几乎)完全相同的用户名,但是分数来自不同的算法
PS:我知道我可以运行NOT IN(CTE/sub-query)
,但也认为它不是最佳解决方案
答案 0 :(得分:1)
这很棘手。我认为从full join
开始,然后进行一些枚举。想法是枚举重叠集中的用户。其中一半使用模运算在每一侧。
然后,计算“额外”行的最小数量。较小的数字来自两组。
select coalesce(userid1, userid2) as userid,
(case when userid1 is null then score2
when userid2 is null then score1
when both_seqnum % 2 = 0 then score1
else score2
end) as score,
(case when userid1 is null then 'tbl_2'
when userid2 is null then 'tbl_1'
when both_seqnum % 2 = 0 then 'tbl_1'
else 'tbl_2'
end) as which
from (select t1.userid as userid1, t2.userid as userid2, t1.score as score1, t2.score as score2,
(case when count(t1.userid) over () < count(t2.userid) over ()
then sum(case when t1.userid is not null and t2.userid is null then 1 else 0 end)
else sum(case when t2.userid is not null and t1.userid is null then 1 else 0 end)
end)
) as extra_count,
(case when t1.userid is not null and t2.userid is null
then row_number() over (partition by (case when t1.userid is not null and t2.userid is not null then 1 else 0 end)
order by userid
)
end) as t1_seqnum,
(case when t1.userid is null and t2.userid is not null
then row_number() over (partition by (case when t1.userid is not null and t2.userid is not null then 1 else 0 end)
order by userid
)
end) as t2_seqnum,
(case when t1.userid is not null and t2.userid is not null
then row_number() over (partition by (case when t2.userid is not null then 1 else 0 end)
order by (case when t1.userid is null then 1 else 0 end)
)
end) as both_seqnum
from tbl1 t1 full join
tbl2 t2
on t1.userid = t2.userid
) t12
where (t1.userid1 is not null and t2.userid1 is not null) or
(t1.userid1 is not null and t1.seqnum <= extra_count) or
(t1.userid2 is not null and t2.seqnum <= extra_count)
答案 1 :(得分:0)
使用完全外部联接并隐藏ID列以比较表与表之间的得分
select coalesce(t1.UserID, t2.UserID) as userid,
t1.score as t1_score,
t2.score as t2_score
from Tbl1 t1
full outer join tbl2 t2
on t1.userid = t2.userid
或者,查找表之间不匹配的出现(找到所有不匹配t2的t1,反之亦然):
select t1.UserID, t1.score, 'T1' as source_tab
from t1
where not exists (select 1 from t2 where t2.UserID = t1.userID and t1.score = t2.score)
union all
select t2.UserID, t2.score, 'T2' as source_tab
from t2
where not exists (select 1 from t1 where t2.UserID = t1.userID and t1.score = t2.score)