在表格中使用以下数据:
| name | d1 | d2 | d3 | d4 | d5 | d6 | d7 | d8 |
|--------|-------|--------|--------|--------|--------|--------|--------|--------|
| matty | 116.7 | 17.88 | 16.1 | 9.731 | (null) | (null) | (null) | (null) |
| jana | 17.88 | 116.7 | 65.45 | 72.1 | (null) | (null) | (null) | (null) |
| chris | 72.1 | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| khaled | 9.731 | 116.7 | 17.88 | 53.1 | 2 | 85.2 | (null) | (null) |
| " | " | " | " | " | " | " | " | " |
| n | " | " | " | " | " | " | " | " |
如何识别值组合在SQL中所有行中出现的次数?
以下是所需的输出样本:
116.7,17.88(3)
116.7,17.88,9.731(2)
72.1(2)
16.1(1)
65.45(1)
53.1(1)
2(1)
85.2(1)
如果SQL无法实现,任何替代方法都可以做到吗?
答案 0 :(得分:3)
PostgreSQL中没有内置组合计算功能,但你可以为它编写一个函数,f.ex。:
create or replace function combinations(variadic anyarray)
returns setof anyarray
language sql
immutable
called on null input
as $func$
with recursive e as (
select *
from unnest($1) with ordinality u(e, o)
where e is not null
),
r as (
select distinct on (e) array[e] ea, array[o] oa
from e
union all
select distinct on (oea) oea, oa || o
from r, e, lateral (select array_agg(u order by u) oea from unnest(ea || e) u) l
where o <> all(oa)
)
select ea
from r
$func$;
使用此功能,您可以编写如下查询:
select combinations, count(*)
from table_name
cross join combinations(d1, d2, d3, d4, d5, d6, d7, d8)
group by 1
但是,样本输入中的组合将比样本输出中包含的组合多得多。 (也许你只是把它们留下来保留空间?)
http://rextester.com/NNVK84197
备注:
variadic
)。anyarray
)。这被称为poliformism。此外,由于returns setof anyarray
,它将返回相同数组类型的完整结果集(多行)。language sql
只是简化了函数体:它不会包含任何高级过程语言结构,例如IF
或LOOP
(language plpgsql
可以包含这些)。e
别名的CTE会从输入数组中展开数据,但会保留o
字段中的排序/索引信息(请参阅with ordinality
)。这在以后是必不可少的,因为我们不能使用值本身来删除重复(即(2, 2)
应该是有效的组合,如前面所述)。 NULL
被丢弃。r
别名的递归CTE(因此recursive
之后的with
关键字)将累积每个组合。它始于每一个值。然后在每个步骤中,它会附加一个元素,其中包含原始集合中的另一个正常(索引)(请参阅where o <> all(oa)
)。因为组合中的元素顺序无关紧要(如您所评论的),我在子查询中对元素进行了排序。此外,两个递归查询部分都使用distinct on (<combination>)
来删除任何可能的重复,这可能在多个元素具有相同值时发生。LATERAL
联接来计算每行的每个组合。此步骤将多次将表的原始行乘以它们的组合。然后,我们只需要使用GROUP BY combinations
&amp;每个COUNT(*)
。答案 1 :(得分:0)
在下面的情况下,我并没有想到d1
,d2
的不同组合。如果两者相同,您将得到一个计数为2
的值。
因此,假设列数是有限且固定的,那么您可以在union
的帮助下完成 <强> Rextester Demo
强>
select concat(array_to_string(array_agg(col),',') ,' (', cnt ,')' ) as result
from
(
select col,count(*) cnt
from
( select d1 as col from table1
union all
select d2 from table1
union all
select d3 from table1
--similarly add other columns
) t
where col is not null
group by col
) t1
group by cnt
order by cnt desc;
输出
result
--------------------------
17.88,116.7 (3)
72.1,16.1,65.45,9.731 (1)
否则,您必须创建一个过程来获取联合中的所有列,然后像上面那样group by
和count
。
答案 2 :(得分:0)
我认为你可以尝试这样的事情(你必须添加以获得4,5,6,7和8值的组合:我停在3个值。)
with CTE_001 as (
SELECT name,D1 AS XVAL FROM mytable2 WHERE D1 IS NOT NULL
UNION ALL
SELECT name,D2 FROM mytable2 WHERE D2 IS NOT NULL
UNION ALL
SELECT name,D3 FROM mytable2 WHERE D3 IS NOT NULL
UNION ALL
SELECT name,D4 FROM mytable2 WHERE D4 IS NOT NULL
UNION ALL
SELECT name,D5 FROM mytable2 WHERE D5 IS NOT NULL
UNION ALL
SELECT name,D6 FROM mytable2 WHERE D6 IS NOT NULL
UNION ALL
SELECT name,D7 FROM mytable2 WHERE D7 IS NOT NULL
UNION ALL
SELECT name,D8 FROM mytable2 WHERE D8 IS NOT NULL
)
SELECT CONCAT(XVAL1, ', ', XVAL2) AS LOV, COUNT(*) AS RC
FROM(
SELECT C1.NAME, C1.XVAL AS XVAL1, C2.XVAL AS XVAL2
FROM CTE_001 C1
INNER JOIN CTE_001 C2 ON C1.NAME = C2.NAME
WHERE C1.XVAL < C2.XVAL
) B
GROUP BY XVAL1, XVAL2
HAVING COUNT(*) >1
UNION ALL
SELECT CONCAT(XVAL1, ', ' , XVAL2,', ', XVAL3), COUNT(*) AS RC
FROM(
SELECT C1.NAME, C1.XVAL AS XVAL1, C2.XVAL AS XVAL2, C3.XVAL AS XVAL3
FROM CTE_001 C1
INNER JOIN CTE_001 C2 O
N C1.NAME = C2.NAME
INNER JOIN CTE_001 C3 ON C1.NAME = C3.NAME
WHERE C1.XVAL < C2.XVAL AND C1.XVAL < C3.XVAL AND C2.XVAL < C3.XVAL
) B
GROUP BY XVAL1, XVAL2, XVAL3
HAVING COUNT(*) >1
ORDER BY 2 DESC
输出:
lov rc
1 17.880, 116.700 3
2 9.731, 116.700 2
3 9.731, 17.880 2
4 9.731, 17.880, 116.700 2