给出以下结构的表格:
subscriber_id,带
11,1
12,1
13,1
...
21,2
22.2
23.2
24,2
...
N1,N
N 2,N
N3,N
...
下,n
我想从每个组中获得n%大小订阅者的子组。对于10%,我应该得到组1的10%,组2的10%...组n的10%。
答案 0 :(得分:1)
听起来你想要一个分层的样本。您可以先在每个组中枚举,然后选择" n"你想要的记录。以下是在SQL Server中如何执行此操作的示例:
select t.id, t.band
from (select t.*,
row_number() over (order by band_seqnum) as seqnum
from (select t.*,
row_number() over (partition by band order by rand(checksum()) as band_seqnum,
count(*) over () as cnt
from t
) t
) t
where band_seqnum <= 0.10 * cnt;
答案 1 :(得分:0)
试试这个
Select * from
(
Select *, NTILE(n%) over(partition by id order by id) 'R' from t)t
where t.R<=(n%)