数组交集作为group by的聚合函数

时间:2017-04-24 15:12:19

标签: sql postgresql group-by aggregate-functions

我有下表:

CREATE TABLE person
AS
  SELECT name, preferences
  FROM ( VALUES
    ( 'John', ARRAY['pizza', 'meat'] ),
    ( 'John', ARRAY['pizza', 'spaghetti'] ),
    ( 'Bill', ARRAY['lettuce', 'pizza'] ),
    ( 'Bill', ARRAY['tomatoes'] )
  ) AS t(name, preferences);

我希望以group by person intersect(preferences)作为聚合函数。所以我想要以下输出:

person | preferences
-------------------------------
John   | ['pizza']
Bill   | []

如何在SQL中完成此操作?我想我需要做类似以下的事情,但X函数是什么样的?

SELECT    person.name, array_agg(X)
FROM      person
LEFT JOIN unnest(preferences) preferences
ON        true
GROUP BY  name

3 个答案:

答案 0 :(得分:3)

您可以创建自己的聚合函数:

CREATE OR REPLACE FUNCTION arr_sec_agg_f(anyarray, anyarray) RETURNS anyarray
   LANGUAGE sql IMMUTABLE AS
   'SELECT CASE
              WHEN $1 IS NULL
              THEN $2
              WHEN $2 IS NULL
              THEN $1
              ELSE array_agg(x)
           END
    FROM (SELECT x FROM unnest($1) a(x)
          INTERSECT
          SELECT x FROM unnest($2) a(x)
         ) q';

CREATE AGGREGATE arr_sec_agg(anyarray) (
   SFUNC = arr_sec_agg_f(anyarray, anyarray),
   STYPE = anyarray
);

SELECT name, arr_sec_agg(preferences)
FROM person
GROUP BY name;

┌──────┬─────────────┐
│ name │ arr_sec_agg │
├──────┼─────────────┤
│ John │ {pizza}     │
│ Bill │             │
└──────┴─────────────┘
(2 rows)

答案 1 :(得分:2)

FILTERARRAY_AGG

一起使用
SELECT name, array_agg(pref) FILTER (WHERE namepref = total)
FROM (
  SELECT name, pref, t1.count AS total, count(*) AS namepref
  FROM (
    SELECT name, preferences, count(*) OVER (PARTITION BY name)
    FROM person
  ) AS t1
  CROSS JOIN LATERAL unnest(preferences) AS pref
  GROUP BY name, total, pref
) AS t2
GROUP BY name;

以下是使用ARRAY构造函数和DISTINCT执行此操作的一种方法。

WITH t AS (
  SELECT name, pref, t1.count AS total, count(*) AS namepref
  FROM (
    SELECT name, preferences, count(*) OVER (PARTITION BY name)
    FROM person
  ) AS t1
  CROSS JOIN LATERAL unnest(preferences) AS pref
  GROUP BY name, total, pref
)
SELECT DISTINCT
  name,
  ARRAY(SELECT pref FROM t AS t2 WHERE total=namepref AND t.name = t2.name)
FROM t;

答案 2 :(得分:1)

如果您不能选择编写自定义聚合(如@LaurenzAlbe提供),则 通常在recursive CTE中注册相同的逻辑:

with recursive cte(name, pref_intersect, pref_prev, iteration) as (
    select   name,
             min(preferences),
             min(preferences),
             0
    from     your_table
    group by name
  union all
    select   name,
             array(select e from unnest(pref_intersect) e
                   intersect
                   select e from unnest(pref_next) e),
             pref_next,
             iteration + 1
    from     cte,
    lateral  (select   your_table.preferences pref_next
              from     your_table
              where    your_table.name        = cte.name
              and      your_table.preferences > cte.pref_prev
              order by your_table.preferences
              limit    1) n
)
select   distinct on (name) name, pref_intersect
from     cte
order by name, iteration desc

http://rextester.com/ZQMGW66052

这里的主要想法是找到一个可以“遍历”行的顺序。我使用了preferences数组的自然顺序(因为没有显示很多列)。理想情况下,这种排序应该发生在(a)唯一字段(最好是在主键上),但是在这里,因为preferences列中的重复不会影响交集的结果,所以它就足够了。