上下文:
我有两张桌子。表A包含具有示例格式的数据(12个有序组A-L,A =最高,L =最低):
ID | BAND
---- | ----
1 | A
2 | B
3 | A
4 | C
5 | D
6 | F
7 | D
8 | H
...
表B包含示例格式的数据:
ID | SCORE
---- | ----
1 | 0.12
2 | 0.37
3 | 0.21
4 | 0.55
5 | 0.01
6 | 0.90
7 | 0.10
8 | 0.71
...
我使用以下方法计算了表A中每组的比例大小:
CREATE TABLE table_a_group_pct AS
SELECT band
, count(*) * 100.0 / sum(count(*)) over() AS pct
FROM table_a
GROUP BY band;
输出:
BAND | PCT
---- | ----
A | 12
B | 15
C | 11
D | 9
E | 10
F | 8
G | 11
H | 10
I | 6
J | 4
K | 3
L | 1
我希望为表B创建12个有序(按分数)组,其比例大小与表A中的组相同。
E.g。表A中12%的行具有组= A,那么前12%的行(基于得分)将被赋予组= A等等....
我想我可以通过使用NTILE(100)
函数找到每个分数的%位置来解决问题,然后使用CASE WHEN
根据表格中每个组的累积百分比创建手动分组答(即如果乐队A拥有最高12%的ID,那么我会在表B中找到第88个百分位并且执行:
CASE WHEN score_pct > 88 then 'A'
WHEN score_pct BETWEEN 88 and 73 then 'B' ...
END AS group`
然而,我试图了解是否有更聪明的方法来解决这个问题。
其他信息: 表A&表B的大小不同,并且没有完全相同的ID,我只是尝试创建相似比例的组。
我的预期输出是这样的:
ID | SCORE | BAND
---- | ---- | ----
1 | 0.12 | K/11
2 | 0.37 | G/7
3 | 0.21 | H/8
4 | 0.55 | E/5
5 | 0.01 | L/12
6 | 0.90 | A/1
7 | 0.10 | K/11
8 | 0.71 | B/2
[编辑我的问题以增加清晰度]
答案 0 :(得分:1)
这可以通过使用cume_dist分析函数以及一些时髦的连接(在12c之前)来实现,如下所示:
(注意我已修改table_a中的数据,使其包含前8个等级;这与您的示例数据不匹配,所以当我的输出与您的输出不匹配时不要感到惊讶。)
WITH table_a AS (SELECT 1 ID, 'A' band FROM dual UNION ALL
SELECT 2 ID, 'B' band FROM dual UNION ALL
SELECT 3 ID, 'A' band FROM dual UNION ALL
SELECT 4 ID, 'C' band FROM dual UNION ALL
SELECT 5 ID, 'D' band FROM dual UNION ALL
SELECT 6 ID, 'E' band FROM dual UNION ALL
SELECT 7 ID, 'D' band FROM dual UNION ALL
SELECT 8 ID, 'F' band FROM dual),
table_b AS (SELECT 1 ID, 0.12 score FROM dual UNION ALL
SELECT 2 ID, 0.37 score FROM dual UNION ALL
SELECT 3 ID, 0.21 score FROM dual UNION ALL
SELECT 4 ID, 0.55 score FROM dual UNION ALL
SELECT 5 ID, 0.01 score FROM dual UNION ALL
SELECT 6 ID, 0.90 score FROM dual UNION ALL
SELECT 7 ID, 0.10 score FROM dual UNION ALL
SELECT 8 ID, 0.71 score FROM dual),
-- end of data set-up, see the rest of the query below:
a_pc AS (SELECT DISTINCT band,
cume_dist() OVER (ORDER BY band) pc_cume_dist
FROM table_a),
b_pc AS (SELECT id,
score,
cume_dist() OVER (ORDER BY score DESC) pc_cume_dist
FROM table_b)
SELECT b_pc.id,
b_pc.score,
b_pc.pc_cume_dist,
min(a_pc.band) band
FROM b_pc
INNER JOIN a_pc ON (a_pc.band = CASE WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'A' THEN 'A'
WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'B' THEN 'B'
WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'C' THEN 'C'
WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'D' THEN 'D'
WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'E' THEN 'E'
WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'F' THEN 'F'
END)
GROUP BY b_pc.id, b_pc.score, b_pc.pc_cume_dist
ORDER BY b_pc.score DESC;
ID SCORE PC_CUME_DIST BAND
---------- ---------- ------------ ----
6 0.9 0.125 A
8 0.71 0.25 A
4 0.55 0.375 B
2 0.37 0.5 C
3 0.21 0.625 D
1 0.12 0.75 D
7 0.1 0.875 E
5 0.01 1 F
或者,在12c中,您可以使用LATERAL
联接,如下所示:
WITH table_a AS (SELECT 1 ID, 'A' band FROM dual UNION ALL
SELECT 2 ID, 'B' band FROM dual UNION ALL
SELECT 3 ID, 'A' band FROM dual UNION ALL
SELECT 4 ID, 'C' band FROM dual UNION ALL
SELECT 5 ID, 'D' band FROM dual UNION ALL
SELECT 6 ID, 'E' band FROM dual UNION ALL
SELECT 7 ID, 'D' band FROM dual UNION ALL
SELECT 8 ID, 'F' band FROM dual),
table_b AS (SELECT 1 ID, 0.12 score FROM dual UNION ALL
SELECT 2 ID, 0.37 score FROM dual UNION ALL
SELECT 3 ID, 0.21 score FROM dual UNION ALL
SELECT 4 ID, 0.55 score FROM dual UNION ALL
SELECT 5 ID, 0.01 score FROM dual UNION ALL
SELECT 6 ID, 0.90 score FROM dual UNION ALL
SELECT 7 ID, 0.10 score FROM dual UNION ALL
SELECT 8 ID, 0.71 score FROM dual),
a_pc AS (SELECT DISTINCT band,
cume_dist() OVER (ORDER BY band) pc_cume_dist
FROM table_a),
b_pc AS (SELECT id,
score,
cume_dist() OVER (ORDER BY score DESC) pc_cume_dist
FROM table_b)
SELECT b_pc.id,
b_pc.score,
b_pc.pc_cume_dist,
a_pc2.band
FROM b_pc,
lateral (SELECT MIN(band) band
FROM a_pc
WHERE a_pc.pc_cume_dist >= b_pc.pc_cume_dist) a_pc2
order by b_pc.score desc
ID SCORE PC_CUME_DIST BAND
---------- ---------- ------------ ----
6 0.9 0.125 A
8 0.71 0.25 A
4 0.55 0.375 B
2 0.37 0.5 C
3 0.21 0.625 D
1 0.12 0.75 D
7 0.1 0.875 E
5 0.01 1 F
以下是在Oracle LiveSQL (which is at version 12.2)上运行的示例。