计算组合

时间:2013-09-20 09:53:41

标签: sql arrays oracle sorting combinations

我有一个带有多个列的Oracle表,其中一些列填充了变量,下面的示例中有大量可能的变量并非详尽无遗。

ID  Col1  Col2  Col3  Col4  Col5 Col6
-------------------------------------
1   X2    B2
2   C3    D1    R4
3   B2    X2
4   E4    T1    W2
5   X2    B2
6   R4    D1   
7   D1    R4    C3

我需要确定不同组合的数量,其中上例中的第1行,第3行和第5行被认为是相同的组合,而行2和7也被认为是相同的。所以期望的结果如下:

Col1  Col2  Col3  Col4  Col5  Col6  Count(*)
------------------------------------------------
B2    X2                            3
C3    D1    R4                      2
E4    T1    W2                      1
D1    R4                            1

但如果我使用它:

SELECT Col1, Col2, Col3, Col4, Col5, Col6, Count(*)
FROM MyTable
GROUP BY Col1, Col2, Col3, Col4, Col5, Col6
ORDER BY Count(*) DESC

然后我的数据中的第3行被认为是唯一的。但是,它与第1行和第5行具有相同的组合。第2行和第7行也不相同,结果如下:

Col1  Col2  Col3  Col4  Col5  Col6  Count(*)
------------------------------------------------
X2    B2                            2
C3    D1    R4                      1
B2    X2                            1
E4    T1    W2                      1
R4    D1                            1
D1    R4    C3                      1

看起来我需要在比较它们之前对col变量进行排序。但对于大型记录集(300万条以上的记录),在Oracle中使用多达20列数据,是否有一个优雅的解决方案呢?

2 个答案:

答案 0 :(得分:0)

我想到了两种方式。首先,您可以编写一个函数,接受六个或更多字符串并按顺序连接它们。然后:

select colstring, count(*)
from
(
  select id, concat_sorted(col1, col2, col3, col4, col5, col6) as colstring
  from MyTable
)
group by colstring;

另一种方法是将每列作为单独的记录并在其上使用listagg,前提是您有Oracle 11g或更高版本可用:

select colstring, count(*)
from
(
  select id, listagg (colx, ',') within group (order by colx) as colstring
  from
  (
    select id, col1 as colx from MyTable
    union all
    select id, col2 from MyTable
    union all
    select id, col3 from MyTable
    union all
    select id, col4 from MyTable
    union all
    select id, col5 from MyTable
    union all
    select id, col6 from MyTable
  )
  group by id
)
group by colstring

答案 1 :(得分:0)

试试这个,

WITH t AS (
SELECT 1 ID, 'X2' col1, 'B2' col2, NULL col3, NULL col4, NULL col5, NULL col6 FROM dual
UNION
SELECT 2, 'C3', 'D1', 'R4', NULL, NULL, NULL  FROM dual
UNION
SELECT 3, 'B2', 'X2', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 4, 'E4', 'T1', 'W2', NULL, NULL, NULL FROM dual
UNION
SELECT 5, 'X2', 'B2', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 6, 'R4', 'T1', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 7, 'D1', 'R4', 'C3', NULL, NULL, NULL FROM dual
)
SELECT col1, col2, col3, col4, col5, col6, tot_count
FROM (
     SELECT col1, col2, col3, col4, col5, col6, cnt,
            MAX(cnt) OVER (PARTITION BY val) AS tot_count,
            row_number() OVER (PARTITION BY val ORDER BY cnt DESC) AS rn
     FROM (
          SELECT col1, col2, col3, col4, col5, col6, val, count(*) OVER (PARTITION BY val) cnt
          FROM (
               SELECT A.ID, col1, col2, col3, col4, col5, col6, val
               FROM (SELECT ID, col1, col2, col3, col4, col5, col6
                    FROM  t
                    ) A,
                    (SELECT ID, listagg( val,',') WITHIN GROUP(ORDER BY  val DESC) AS val 
                     FROM (
                         SELECT ID, val
                         FROM   t
                         unpivot ( val FOR origin IN (col1,  col2, col3, col4, col5, col6))
                         )
                     GROUP BY ID
                     )b
                WHERE A.ID = b.ID
                )
           ORDER BY val
           )t1 
     )t2
WHERE tot_count = cnt 
AND rn = 1
ORDER BY tot_count DESC;