上下文: 我有一个表,其中包含如下所示的RetailerCode,CustomerID,Segment
RetailerCode CID Segment
A6005 13SVC15 High
A6005 19VDE1F Low
A6005 1B3BD1F Medium
A6005 1B3HB48 Medium
A6005 1B3HB49 Low
A9006 1B3HB40 High
A9006 1B3HB41 High
A9006 1B3HB43 Low
A9006 1B3HB46 Medium
在这里,我想按以下方式划分数据集以进行控制和测试, 对于每个RetailerCode,我都有一组客户,每个客户都标记为一个细分。我需要以这样的方式进行划分
对于每个零售商
- 有10%的高级客户可以控制,其余90%的高级客户可以进行测试。
- 有10%的中型客户可以控制,其余90%的中型客户可以进行测试。
- 有10%的低客户可以控制,其余90%的低客户可以进行测试。
我尝试了下面的代码,但我知道它是错误的。
select RetailerCode, CID,Segment
(case when row_number() over (order by newid()) <= (select 0.1* count(*) from Table)
then 'control'
else 'test'
end) as group
from Table
group by RetailerCode, CID,Segment
Order by RetailerCode
有人可以帮我吗?预先感谢
答案 0 :(得分:0)
您似乎很亲密:
select RetailerCode, CID,Segment
(case when row_number() over (partition by segment order by newid()) <=
0.1 * count(*) over (partition by segment)
then 'control'
else 'test'
end) as group
from Table
Order by RetailerCode;
我不明白为什么需要group by
。
答案 1 :(得分:0)
percent_rank
基于rank
和count
:
select RetailerCode, CID,Segment
(case when percent_rank() over (partition by segment order by newid()) <= 0.1
then 'control'
else 'test'
end) as group
from Table
Order by RetailerCode
并且ntile
基于row_number
和count
:
select RetailerCode, CID,Segment
(case when ntile(10) over (partition by segment order by newid()) = 1
then 'control'
else 'test'
end) as group
from Table
Order by RetailerCode