我遇到了sql聚合问题。
考虑以下表/视图:
Column1 Column2
1 2564
2 6550
1 3578
2 6548
2 4789
1 9876
我想设计一个查询来执行以下操作:
对于每个不同的Column1值,样本2x记录。采样策略可能是某种类型的自举/重采样,因为可能没有太多的数据点。
因此表格将成为:
Column1 Column2
1 9876
1 3578
2 6548
2 6550
平台:MS SQL
感谢任何答案。
答案 0 :(得分:3)
对于没有替换的随机样本:
select t.*
from (select t.*,
row_number() over (partition by column1 order by newid()) as seqnum
from t
) t
where seqnum <= 2;
或者,或者:
select top (2) with ties t.*
from t
order by row_number() over (partition by id order by newid());
对于替换的随机样本:
替换:
select *
from ((select top (1) with ties t.*
from t
order by row_number() over (partition by id order by newid())
)
union all
(select top (1) with ties t.*
from t
order by row_number() over (partition by id order by newid())
)
) x;