整个表中行的值组合计数

时间:2017-06-09 07:42:24

标签: sql postgresql

在表格中使用以下数据:

|   name |    d1 |     d2 |     d3 |     d4 |     d5 |     d6 |     d7 |     d8 |
|--------|-------|--------|--------|--------|--------|--------|--------|--------|
|  matty | 116.7 |  17.88 |   16.1 |  9.731 | (null) | (null) | (null) | (null) |
|   jana | 17.88 |  116.7 |  65.45 |   72.1 | (null) | (null) | (null) | (null) |
|  chris |  72.1 | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| khaled | 9.731 |  116.7 |  17.88 |   53.1 |      2 |   85.2 | (null) | (null) |
|    "   |   "   |   "    |   "    |   "    |   "    |   "    |   "    |   "    |
|    n   |   "   |   "    |   "    |   "    |   "    |   "    |   "    |   "    |

如何识别值组合在SQL中所有行中出现的次数?

  

以下是所需的输出样本:

     

116.7,17.88(3)

     

116.7,17.88,9.731(2)

     

72.1(2)

     

16.1(1)

     

65.45(1)

     

53.1(1)

     

2(1)

     

85.2(1)

如果SQL无法实现,任何替代方法都可以做到吗?

3 个答案:

答案 0 :(得分:3)

PostgreSQL中没有内置组合计算功能,但你可以为它编写一个函数,f.ex。:

create or replace function combinations(variadic anyarray)
  returns setof anyarray
  language sql
  immutable
  called on null input
as $func$
  with recursive e as (
      select *
      from   unnest($1) with ordinality u(e, o)
      where  e is not null
  ),
  r as (
      select distinct on (e) array[e] ea, array[o] oa
      from   e
    union all
      select distinct on (oea) oea, oa || o
      from   r, e, lateral (select array_agg(u order by u) oea from unnest(ea || e) u) l
      where  o <> all(oa)
  )
  select ea
  from   r
$func$;

使用此功能,您可以编写如下查询:

select     combinations, count(*)
from       table_name
cross join combinations(d1, d2, d3, d4, d5, d6, d7, d8)
group by   1

但是,样本输入中的组合将比样本输出中包含的组合多得多。 (也许你只是把它们留下来保留空间?)

http://rextester.com/NNVK84197

备注

  • 上面的函数使用可变数量的参数,这些参数被转换为本机PostgreSQL数组(因为variadic)。
  • 它接受任何类型的输入,只要它们都是相同的类型(因为anyarray)。这被称为poliformism。此外,由于returns setof anyarray,它将返回相同数组类型的完整结果集(多行)。
  • language sql只是简化了函数体:它不会包含任何高级过程语言结构,例如IFLOOPlanguage plpgsql可以包含这些)。
  • 带有e别名的CTE会从输入数组中展开数据,但会保留o字段中的排序/索引信息(请参阅with ordinality)。这在以后是必不可少的,因为我们不能使用值本身来删除重复(即(2, 2)应该是有效的组合,如前面所述)。 NULL被丢弃。
  • 具有r别名的递归CTE(因此recursive之后的with关键字)将累积每个组合。它始于每一个值。然后在每个步骤中,它会附加一个元素,其中包含原始集合中的另一个正常(索引)(请参阅where o <> all(oa))。因为组合中的元素顺序无关紧要(如您所评论的),我在子查询中对元素进行了排序。此外,两个递归查询部分都使用distinct on (<combination>)来删除任何可能的重复,这可能在多个元素具有相同值时发生。
  • 解决方案查询使用隐式LATERAL联接来计算每行的每个组合。此步骤将多次将表的原始行乘以它们的组合。然后,我们只需要使用GROUP BY combinations&amp;每个COUNT(*)

答案 1 :(得分:0)

在下面的情况下,我并没有想到d1d2的不同组合。如果两者相同,您将得到一个计数为2的值。

因此,假设列数是有限且固定的,那么您可以在union

的帮助下完成

<强> Rextester Demo

select concat(array_to_string(array_agg(col),',')   ,' (',  cnt ,')' ) as result
from 
(
    select col,count(*) cnt
    from 
    (   select d1 as col from table1
        union all 
        select d2 from table1
        union all
        select d3 from table1
         --similarly add other columns
    ) t
    where col is not null
    group by col
) t1
group by cnt
order by cnt desc;

输出

result 
--------------------------
17.88,116.7 (3) 
72.1,16.1,65.45,9.731 (1) 

否则,您必须创建一个过程来获取联合中的所有列,然后像上面那样group bycount

答案 2 :(得分:0)

我认为你可以尝试这样的事情(你必须添加以获得4,5,6,7和8值的组合:我停在3个值。)

    with CTE_001 as (
        SELECT name,D1 AS XVAL FROM mytable2 WHERE D1 IS NOT NULL
        UNION ALL
        SELECT name,D2 FROM mytable2 WHERE D2 IS NOT NULL
        UNION ALL
        SELECT name,D3 FROM mytable2 WHERE D3 IS NOT NULL
        UNION ALL
        SELECT name,D4 FROM mytable2 WHERE D4 IS NOT NULL
        UNION ALL
        SELECT name,D5 FROM mytable2 WHERE D5 IS NOT NULL
        UNION ALL
        SELECT name,D6 FROM mytable2 WHERE D6 IS NOT NULL
        UNION ALL
        SELECT name,D7 FROM mytable2 WHERE D7 IS NOT NULL
        UNION ALL
        SELECT name,D8 FROM mytable2 WHERE D8 IS NOT NULL
    )
    SELECT CONCAT(XVAL1, ', ', XVAL2) AS LOV, COUNT(*) AS RC 
    FROM(
     SELECT C1.NAME, C1.XVAL AS XVAL1, C2.XVAL AS XVAL2
        FROM CTE_001 C1 
        INNER JOIN CTE_001 C2 ON C1.NAME = C2.NAME
        WHERE C1.XVAL < C2.XVAL
        ) B 
        GROUP BY XVAL1, XVAL2
        HAVING COUNT(*) >1
    UNION ALL
    SELECT CONCAT(XVAL1, ', ' , XVAL2,', ', XVAL3), COUNT(*) AS RC 
    FROM(
    SELECT C1.NAME, C1.XVAL AS XVAL1, C2.XVAL AS XVAL2, C3.XVAL AS XVAL3
        FROM CTE_001 C1 
        INNER JOIN CTE_001 C2 O
N C1.NAME = C2.NAME
    INNER JOIN CTE_001 C3 ON C1.NAME = C3.NAME
    WHERE C1.XVAL < C2.XVAL  AND C1.XVAL < C3.XVAL AND C2.XVAL < C3.XVAL
    ) B
GROUP BY XVAL1, XVAL2, XVAL3
HAVING COUNT(*) >1
ORDER BY 2 DESC   

输出:

    lov rc
1   17.880, 116.700 3
2   9.731, 116.700  2
3   9.731, 17.880   2
4   9.731, 17.880, 116.700  2