根据另一个表中的组比例在表中创建组

时间:2017-04-04 11:06:43

标签: sql oracle

上下文:

我有两张桌子。表A包含具有示例格式的数据(12个有序组A-L,A =最高,L =最低):

ID   | BAND
---- | ---- 
1    | A    
2    | B  
3    | A  
4    | C  
5    | D  
6    | F 
7    | D  
8    | H 
...

表B包含示例格式的数据:

ID   | SCORE
---- | ---- 
1    | 0.12    
2    | 0.37  
3    | 0.21  
4    | 0.55  
5    | 0.01  
6    | 0.90 
7    | 0.10  
8    | 0.71    
...

我使用以下方法计算了表A中每组的比例大小:

 CREATE TABLE table_a_group_pct AS
 SELECT band
 , count(*) * 100.0 / sum(count(*)) over() AS pct 
 FROM table_a 
 GROUP BY band;

输出:

BAND | PCT
---- | ----  
A    | 12  
B    | 15  
C    | 11  
D    | 9 
E    | 10  
F    | 8  
G    | 11  
H    | 10  
I    | 6
J    | 4
K    | 3
L    | 1

我希望为表B创建12个有序(按分数)组,其比例大小与表A中的组相同。

E.g。表A中12%的行具有组= A,那么前12%的行(基于得分)将被赋予组= A等等....

我想我可以通过使用NTILE(100)函数找到每个分数的%位置来解决问题,然后使用CASE WHEN根据表格中每个组的累积百分比创建手动分组答(即如果乐队A拥有最高12%的ID,那么我会在表B中找到第88个百分位并且执行:

CASE WHEN score_pct > 88 then 'A' 
     WHEN score_pct BETWEEN 88 and 73 then 'B' ...
END AS group`

然而,我试图了解是否有更聪明的方法来解决这个问题。

其他信息: 表A&表B的大小不同,并且没有完全相同的ID,我只是尝试创建相似比例的组。

我的预期输出是这样的:

ID   | SCORE | BAND
---- | ----  | ----
1    | 0.12  | K/11
2    | 0.37  | G/7
3    | 0.21  | H/8
4    | 0.55  | E/5
5    | 0.01  | L/12
6    | 0.90  | A/1
7    | 0.10  | K/11
8    | 0.71  | B/2  

[编辑我的问题以增加清晰度]

1 个答案:

答案 0 :(得分:1)

这可以通过使用cume_dist分析函数以及一些时髦的连接(在12c之前)来实现,如下所示:

(注意我已修改table_a中的数据,使其包含前8个等级;这与您的示例数据不匹配,所以当我的输出与您的输出不匹配时不要感到惊讶。)

WITH table_a AS (SELECT 1 ID, 'A' band FROM dual UNION ALL
                 SELECT 2 ID, 'B' band FROM dual UNION ALL
                 SELECT 3 ID, 'A' band FROM dual UNION ALL
                 SELECT 4 ID, 'C' band FROM dual UNION ALL
                 SELECT 5 ID, 'D' band FROM dual UNION ALL
                 SELECT 6 ID, 'E' band FROM dual UNION ALL
                 SELECT 7 ID, 'D' band FROM dual UNION ALL
                 SELECT 8 ID, 'F' band FROM dual),
     table_b AS (SELECT 1 ID, 0.12 score FROM dual UNION ALL
                 SELECT 2 ID, 0.37 score FROM dual UNION ALL
                 SELECT 3 ID, 0.21 score FROM dual UNION ALL
                 SELECT 4 ID, 0.55 score FROM dual UNION ALL
                 SELECT 5 ID, 0.01 score FROM dual UNION ALL
                 SELECT 6 ID, 0.90 score FROM dual UNION ALL
                 SELECT 7 ID, 0.10 score FROM dual UNION ALL
                 SELECT 8 ID, 0.71 score FROM dual),
-- end of data set-up, see the rest of the query below:
        a_pc AS (SELECT DISTINCT band,
                        cume_dist() OVER (ORDER BY band) pc_cume_dist
                 FROM   table_a),
        b_pc AS (SELECT id,
                        score,
                        cume_dist() OVER (ORDER BY score DESC) pc_cume_dist
                 FROM   table_b)
SELECT b_pc.id,
       b_pc.score,
       b_pc.pc_cume_dist,
       min(a_pc.band) band
FROM   b_pc
       INNER JOIN a_pc ON (a_pc.band = CASE WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'A' THEN 'A'
                                            WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'B' THEN 'B'
                                            WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'C' THEN 'C'
                                            WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'D' THEN 'D'
                                            WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'E' THEN 'E'
                                            WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'F' THEN 'F'
                                       END)
GROUP BY b_pc.id, b_pc.score, b_pc.pc_cume_dist
ORDER BY b_pc.score DESC;

        ID      SCORE PC_CUME_DIST BAND
---------- ---------- ------------ ----
         6        0.9        0.125 A
         8       0.71         0.25 A
         4       0.55        0.375 B
         2       0.37          0.5 C
         3       0.21        0.625 D
         1       0.12         0.75 D
         7        0.1        0.875 E
         5       0.01            1 F

或者,在12c中,您可以使用LATERAL联接,如下所示:

WITH table_a AS (SELECT 1 ID, 'A' band FROM dual UNION ALL 
                 SELECT 2 ID, 'B' band FROM dual UNION ALL 
                 SELECT 3 ID, 'A' band FROM dual UNION ALL 
                 SELECT 4 ID, 'C' band FROM dual UNION ALL 
                 SELECT 5 ID, 'D' band FROM dual UNION ALL 
                 SELECT 6 ID, 'E' band FROM dual UNION ALL 
                 SELECT 7 ID, 'D' band FROM dual UNION ALL 
                 SELECT 8 ID, 'F' band FROM dual), 
     table_b AS (SELECT 1 ID, 0.12 score FROM dual UNION ALL 
                 SELECT 2 ID, 0.37 score FROM dual UNION ALL 
                 SELECT 3 ID, 0.21 score FROM dual UNION ALL 
                 SELECT 4 ID, 0.55 score FROM dual UNION ALL 
                 SELECT 5 ID, 0.01 score FROM dual UNION ALL 
                 SELECT 6 ID, 0.90 score FROM dual UNION ALL 
                 SELECT 7 ID, 0.10 score FROM dual UNION ALL 
                 SELECT 8 ID, 0.71 score FROM dual), 
        a_pc AS (SELECT DISTINCT band, 
                        cume_dist() OVER (ORDER BY band) pc_cume_dist 
                 FROM   table_a), 
        b_pc AS (SELECT id, 
                        score, 
                        cume_dist() OVER (ORDER BY score DESC) pc_cume_dist 
                 FROM   table_b) 
SELECT b_pc.id, 
       b_pc.score, 
       b_pc.pc_cume_dist, 
       a_pc2.band 
FROM   b_pc, 
       lateral (SELECT MIN(band) band 
                FROM   a_pc 
                WHERE  a_pc.pc_cume_dist >= b_pc.pc_cume_dist) a_pc2 
order by b_pc.score desc

        ID      SCORE PC_CUME_DIST BAND
---------- ---------- ------------ ----
         6        0.9        0.125 A
         8       0.71         0.25 A
         4       0.55        0.375 B
         2       0.37          0.5 C
         3       0.21        0.625 D
         1       0.12         0.75 D
         7        0.1        0.875 E
         5       0.01            1 F

以下是在Oracle LiveSQL (which is at version 12.2)上运行的示例。