我有一个带有多个列的Oracle表,其中一些列填充了变量,下面的示例中有大量可能的变量并非详尽无遗。
ID Col1 Col2 Col3 Col4 Col5 Col6
-------------------------------------
1 X2 B2
2 C3 D1 R4
3 B2 X2
4 E4 T1 W2
5 X2 B2
6 R4 D1
7 D1 R4 C3
我需要确定不同组合的数量,其中上例中的第1行,第3行和第5行被认为是相同的组合,而行2和7也被认为是相同的。所以期望的结果如下:
Col1 Col2 Col3 Col4 Col5 Col6 Count(*)
------------------------------------------------
B2 X2 3
C3 D1 R4 2
E4 T1 W2 1
D1 R4 1
但如果我使用它:
SELECT Col1, Col2, Col3, Col4, Col5, Col6, Count(*)
FROM MyTable
GROUP BY Col1, Col2, Col3, Col4, Col5, Col6
ORDER BY Count(*) DESC
然后我的数据中的第3行被认为是唯一的。但是,它与第1行和第5行具有相同的组合。第2行和第7行也不相同,结果如下:
Col1 Col2 Col3 Col4 Col5 Col6 Count(*)
------------------------------------------------
X2 B2 2
C3 D1 R4 1
B2 X2 1
E4 T1 W2 1
R4 D1 1
D1 R4 C3 1
看起来我需要在比较它们之前对col变量进行排序。但对于大型记录集(300万条以上的记录),在Oracle中使用多达20列数据,是否有一个优雅的解决方案呢?
答案 0 :(得分:0)
我想到了两种方式。首先,您可以编写一个函数,接受六个或更多字符串并按顺序连接它们。然后:
select colstring, count(*)
from
(
select id, concat_sorted(col1, col2, col3, col4, col5, col6) as colstring
from MyTable
)
group by colstring;
另一种方法是将每列作为单独的记录并在其上使用listagg,前提是您有Oracle 11g或更高版本可用:
select colstring, count(*)
from
(
select id, listagg (colx, ',') within group (order by colx) as colstring
from
(
select id, col1 as colx from MyTable
union all
select id, col2 from MyTable
union all
select id, col3 from MyTable
union all
select id, col4 from MyTable
union all
select id, col5 from MyTable
union all
select id, col6 from MyTable
)
group by id
)
group by colstring
答案 1 :(得分:0)
试试这个,
WITH t AS (
SELECT 1 ID, 'X2' col1, 'B2' col2, NULL col3, NULL col4, NULL col5, NULL col6 FROM dual
UNION
SELECT 2, 'C3', 'D1', 'R4', NULL, NULL, NULL FROM dual
UNION
SELECT 3, 'B2', 'X2', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 4, 'E4', 'T1', 'W2', NULL, NULL, NULL FROM dual
UNION
SELECT 5, 'X2', 'B2', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 6, 'R4', 'T1', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 7, 'D1', 'R4', 'C3', NULL, NULL, NULL FROM dual
)
SELECT col1, col2, col3, col4, col5, col6, tot_count
FROM (
SELECT col1, col2, col3, col4, col5, col6, cnt,
MAX(cnt) OVER (PARTITION BY val) AS tot_count,
row_number() OVER (PARTITION BY val ORDER BY cnt DESC) AS rn
FROM (
SELECT col1, col2, col3, col4, col5, col6, val, count(*) OVER (PARTITION BY val) cnt
FROM (
SELECT A.ID, col1, col2, col3, col4, col5, col6, val
FROM (SELECT ID, col1, col2, col3, col4, col5, col6
FROM t
) A,
(SELECT ID, listagg( val,',') WITHIN GROUP(ORDER BY val DESC) AS val
FROM (
SELECT ID, val
FROM t
unpivot ( val FOR origin IN (col1, col2, col3, col4, col5, col6))
)
GROUP BY ID
)b
WHERE A.ID = b.ID
)
ORDER BY val
)t1
)t2
WHERE tot_count = cnt
AND rn = 1
ORDER BY tot_count DESC;