在SQL查询中取样

时间:2014-11-12 19:29:12

标签: sql teradata sample random-sample

我正在处理类似这样的问题:

我有一个包含许多列的表,但主要是DepartmentIdEmployeeIds

Employee Ids    Department Ids
------------------------------
A                   1
B                   1
C                   1
D                   1
AA                  2
BB                  2
CC                  2
A1                  3
B1                  3
C1                  3
D1                  3

我想写一个SQL查询,以便为每个EmployeeIds取出2个样本DepartmentID

Employee Id  Dept Ids
B              1
C              1
AA             2
CC             2
D1             3
A1             3

目前我正在编写查询,

select
   EmployeeId, DeptIds, count(*)
from 
   table_name
group by 1,2
sample 2

但它总共给我两行。

任何帮助?

1 个答案:

答案 0 :(得分:1)

如果我知道的部门数量很少,你可以做一个分层抽样:

select *
from table_name
sample
   when DeptIds = 1 then 2
   when DeptIds = 2 then 2
   when DeptIds = 3 then 2
end

否则是RANDOM和ROW_NUMBER的组合:

select *
from
 (
   sel EmployeeId, DeptIds, random(1,10000000) as rand
   from table_name
 ) as dt
qualify
   row_number()
   over (partition by DeptIds
         order by rand) <= 2