我有一张表(在Postgres 9.1中),看起来像这样:
CREATE TABLE actions (
user_id: INTEGER,
date: DATE,
action: VARCHAR(255),
count: INTEGER
)
例如:
user_id | date | action | count
---------------+------------+--------------+-------
1 | 2013-01-01 | Email | 1
1 | 2013-01-02 | Call | 3
1 | 2013-01-03 | Email | 3
1 | 2013-01-04 | Call | 2
1 | 2013-01-04 | Voicemail | 2
1 | 2013-01-04 | Email | 2
2 | 2013-01-04 | Email | 2
我希望能够查看用户对一组特定操作的总体操作;例如,电话+电子邮件:
user_id | date | count
-----------+-------------+---------
1 | 2013-01-01 | 1
1 | 2013-01-02 | 4
1 | 2013-01-03 | 7
1 | 2013-01-04 | 11
2 | 2013-01-04 | 2
到目前为止,我创造的怪物是这样的:
SELECT
date, user_id, SUM(count) OVER (PARTITION BY user_id ORDER BY date) AS count
FROM
actions
WHERE
action IN ('Call', 'Email')
GROUP BY
user_id, date, count;
哪个适用于单个操作,但是当它们在同一天发生时似乎会中断,例如,而不是11
上的预期2013-01-04
,我们得到9
:
date | user_id | count
------------+--------------+-------
2013-01-01 | 1 | 1
2013-01-02 | 1 | 4
2013-01-03 | 1 | 7
2013-01-04 | 1 | 9 <-- should be 11?
2013-01-04 | 2 | 2
是否可以调整我的查询来解决此问题?我尝试删除count
上的分组,但Postgres似乎不喜欢这样:
column "actions.count" must appear in the GROUP BY clause or be used in an aggregate function LINE 2: date, user_id, SUM(count) OVER (PARTITION BY user... ^
答案 0 :(得分:1)
该表有一个名为“count”的列,SELECT子句中的表达式别名为“count”,它是不明确的。
阅读文档:http://www.postgresql.org/docs/9.0/static/sql-select.html#SQL-GROUPBY
如果含糊不清,GROUP BY名称将被解释为 输入列名称而不是输出列名称。
这意味着,您的查询不会按SELECT子句中计算的“count”进行分组,而是按表中的“count”值进行分组。
此查询提供了预期结果,请参阅SQL Fiddle
SELECT date, user_id, count
from (
Select date, user_id,
SUM(count) OVER (PARTITION BY user_id ORDER BY date) AS count
FROM actions
WHERE
action IN ('Call', 'Email')
) alias
GROUP BY
user_id, date, count;
答案 1 :(得分:1)
目前还不清楚您是想按user_id
还是date
还不清楚是否要在结果列表中包含日期,而基表中没有行。在这种情况下,请参考这个密切相关的答案:
PostgreSQL: running count of rows for a query 'by minute'
首先,我使用此测试表而不是有问题的表格:
CREATE TEMP TABLE actions (
user_id integer,
thedate date,
action text,
ct integer
);
您使用reserved words和函数名称作为标识符(列名称)是问题的一部分。
由于首先应用了聚合函数,因此原始查询会将为user_id = 1
和thedate = '2013-01-04'
找到的两个行归为一个。您必须乘以count(*)
才能获得实际的运行计数。
您可以执行此不带子查询,因为您可以组合聚合函数和窗口函数。首先应用聚合函数。您甚至可以在聚合函数的结果上使用窗口函数。
SELECT thedate
, user_id
, sum(ct * count(*)) OVER (PARTITION BY user_id
ORDER BY thedate) AS running_ct
FROM actions
WHERE action IN ('Call', 'Email')
GROUP BY user_id, thedate, ct
ORDER BY user_id, thedate;
或简化为:
...
, sum(sum(ct)) OVER (PARTITION BY user_id
ORDER BY thedate) AS running_ct
...
这也应该是所提出解决方案的最快。
这里,内部sum()
是一个聚合函数,而外部sum()
是一个窗口函数 - 在聚合函数的结果上。
DISTINCT
另一种方法是使用DISTINCT
or DISTINCT ON
,因为在窗口函数之后应用:
DISTINCT
- 这是可能的,因为在这种情况下running_ct
保证相同,因为default frame definition of window functions会立即对所有同伴进行求和。
SELECT DISTINCT
thedate
, user_id
, sum(ct) OVER (PARTITION BY user_id ORDER BY thedate) AS running_ct
FROM actions
WHERE action IN ('Call', 'Email')
ORDER BY thedate, user_id;
或简化为DISTINCT ON
:
SELECT DISTINCT ON (thedate, user_id)
...
答案 2 :(得分:1)
此查询生成您要查找的结果:
SELECT DISTINCT
date, user_id, SUM(count) OVER (PARTITION BY user_id ORDER BY date) AS count
FROM actions
WHERE
action IN ('Call', 'Email');
默认窗口已经是您想要的,according to the official docs和“DISTINCT”在同一天发生电子邮件和通话时消除了重复的行。
请参阅SQL Fiddle。