想使用postgres的运行总和进行查询

时间:2015-01-22 06:30:11

标签: sql postgresql amazon-redshift

我遇到了为postgres创建查询的问题(严格来说就是它的redshift) 表格数据如下 该表是PARTITION BY user_id ORDER BY created_at desc

数据

user_id| x | y |  min |     created_at      
-------+---+---+------+---------------------
      1| 1 | 1 |    1 | 2015-01-15 17:26:53
      1| 1 | 1 |    2 | 2015-01-15 17:26:54
      1| 1 | 1 |    3 | 2015-01-15 17:26:55
      1| 2 | 1 |   10 | 2015-01-16 02:46:21
      1| 1 | 1 |   15 | 2015-01-16 02:46:22
      1| 3 | 3 |   11 | 2015-01-16 03:01:44
      1| 3 | 3 |    2 | 2015-01-16 03:02:06
      2| 1 | 1 |    3 | 2015-01-16 03:02:12
      2| 2 | 1 |    4 | 2015-01-16 03:02:15
      2| 2 | 1 |    7 | 2015-01-16 03:02:18

我想要的是

理想的结果

user_id| x | y |  sum_min |
-------+---+---+----------+
      1| 1 | 1 |        6 |
      1| 2 | 1 |       10 |
      1| 1 | 1 |       15 |
      1| 3 | 3 |       13 |
      2| 1 | 1 |        3 |
      2| 2 | 1 |       11 |

如果我只使用user_id,x,y, 结果将是

 user_id| x | y |  sum_min |
 -------+---+---+----------+
       1| 1 | 1 |       21 |
       :| : | : |        : |

这对我不利:(

3 个答案:

答案 0 :(得分:1)

试试这个

with cte as (
select user_id,x,y,created_at,sum(min) over (partition by user_id,x,y,replace order by user_id )  sum_min  from (
select user_id,x,y,min,replace( created_at::date::text ,'-',''),created_at   from usr order by created_at
)t   order by created_at
)

select user_id,x,y,sum_min from cte 
group by sum_min,user_id,x,y
order by user_id

答案 1 :(得分:0)

也许尝试按创建日期对其进行分组:

select user_id, x, y, sum(min), created_at::date from test
group by user_id, x, y, created_at::date
order by user_id, x, y, created_at

答案 2 :(得分:0)

似乎您要做的是计算在列上排序的记录簇上的聚合函数,该列基于三列中的相同值,仅与这三列值分开。这在标准SQL中是不可能的,因为记录的顺序与任何SQL命令都无关。按日期排序的事实并没有改变这一点:SQL命令根本不支持这种分层。

我所知道的唯一选项是在您的plpgsql关系上创建一个cursor函数data(可能是一个视图,但对表格同样有效) 。您迭代关系中的所有记录,并且遇到的每个群集总结min值并输出具有聚类列和总和值的新记录。

CREATE FUNCTION sum_clusters()
RETURNS TABLE (user_id int, x int, y int, sum_int int) AS $$
DECLARE
  data_row data%ROWTYPE;
  cur CURSOR FOR SELECT * FROM data;
  cur_user integer;
  cur_x integer;
  cur_y integer;
  sum integer;
BEGIN
  OPEN cur;
  FETCH NEXT cur INTO data_row;
  LOOP
    IF NOT FOUND THEN
      EXIT;
    END IF;
    cur_user := data_row.user_id;
    cur_x := data_row.x;
    cur_y := data_row.y;
    sum := data_row.min;
    LOOP
      FETCH NEXT cur INTO data_row;
      IF NOT FOUND THEN
        EXIT;
      END IF;
      IF (data_row.user_id = cur_user) AND (data_row.x = cur_x) AND (data_row.y = cur_y) THEN
        sum += data_row.min;
      ELSE
        EXIT;
      END IF;
    END LOOP;
    RETURN NEXT cur_user, cur_x, cur_y, sum;
  END LOOP;
  RETURN;
END;
$$ LANGUAGE plpgsql;

这是很多代码而不是特别快,但它应该可以工作。