我有下表:
links
:
created_at active
2017-08-12 15:46:01 false
2017-08-13 15:46:01 true
2017-08-14 15:46:01 true
2017-08-15 15:46:01 false
当给定日期范围时,我必须提取时间序列,告诉我在等于或小于当前(滚动)日期的日期创建了多少活动链接。
输出(日期范围2017-08-12 - 2017-08-17):
day count
2017-08-12 0 (there are 0 active links created on 2017-08-12 and earlier)
2017-08-13 1 (there is 1 active link created on 2017-08-13 and earlier)
2017-08-14 2 (there are 2 active links created on 2017-08-14 and earlier)
2017-08-15 2 ...
2017-08-16 2
2017-08-17 2
我想出了以下关于生成日期的查询:
SELECT date_trunc('day', dd):: date
FROM generate_series
( '2017-08-12'::timestamp
, '2017-08-17'::timestamp
, '1 day'::interval) dd
但滚动计数令我感到困惑,我不确定如何继续。这可以通过窗口函数来解决吗?
答案 0 :(得分:1)
我会使用汇总和累计金额 - 假设您每天至少有一次:
select date_trunc('day', created_at)::date as created_date,
sum(active::int) as actives,
sum(sum(active::int)) over (date_trunc('day', created_at)) as running_actives
from t
group by created_date;
如果数据中有漏洞,则只需生成日期。但是,如果你这样做,我会建议包括where active
- 你现在可以加入它,我只是想确保没有漏洞。
答案 1 :(得分:1)
<强>演示强>
http://rextester.com/OGZV44492
<强> SQL 强>
SELECT date_trunc('day', dd):: date AS day,
(SELECT COUNT(*) FROM links
WHERE active = true
AND date(created_at) <= date_trunc('day', dd)) AS "count"
FROM generate_series
( '2017-08-12'::timestamp
, '2017-08-17'::timestamp
, '1 day'::interval) dd
<强>解释强>
上面的SQL执行了一个简单的子选择来计算links
表中的行数,其中日期部分小于或等于生成范围中的每个日期。
答案 2 :(得分:1)
这应该是最快的:
SELECT day::date
, sum(ct) OVER (ORDER BY day) AS count
FROM generate_series (timestamp '2017-08-12'
, timestamp '2017-08-17'
, interval '1 day') day
LEFT JOIN (
SELECT date_trunc('day', created_at) AS day, count(*) AS ct
FROM tbl
WHERE active -- fastest
GROUP BY 1
) t USING (day)
ORDER BY 1;
dbfiddle here
count()
仅计算非空行,因此您可以使用count(active OR NULL)
。但计数的最快选择是排除带有WHERE
子句的不相关行。由于我们无论如何都会添加generate_series()
的所有日期,因此这是最佳选择。
比较
由于generate_series()
返回timestamp
(不是date
),我使用date_trunc()
来获取匹配的时间戳(非常快一点)。
答案 3 :(得分:0)
我认为像这样的查询可以帮到你:
;with t as (SELECT date_trunc('day', dd):: date
FROM generate_series
( '2017-08-12'::timestamp
, '2017-08-17'::timestamp
, '1 day'::interval) dd
)
select distinct t.date_trunc
, count(case when links.active = 'true' then 1 end) over (order by links.created_at) count
from t
left join links
on t.date_trunc = cast(links.created_at as date)
order by t.date_trunc;
<强> SQL Fiddle Demo 强>
答案 4 :(得分:0)
如果您在表中缺少了几天,则需要使用generate_series()来创建它们。由于这基本上是将前两个答案放在一起,因此给予了信用;;)
但是,这个连接最好在GROUP BY之后完成,它只会返回一行而不是之前的一行,这会导致更大的JOIN。
WITH dailydata AS (
SELECT
d::DATE, COALESCE(n,0) n
FROM
generate_series(
'2000-01-01'::DATE,
'2000-10-01'::DATE,
'1 DAY'::INTERVAL ) d
LEFT JOIN
(SELECT created_at::DATE d, count(*) AS n
FROM links WHERE active
GROUP BY d) data
USING (d)
)
SELECT d, n, sum(n) OVER (ORDER BY d) FROM dailydata;
答案 5 :(得分:0)
CREATE TABLE links
( created_at timestamp
, active boolean
);
INSERT INTO links(created_at,active)VALUES
('2017-08-12 15:46:01', false)
,('2017-08-13 15:46:01', true)
,('2017-08-14 15:46:01', true)
,('2017-08-15 15:46:01', false)
;
WITH cal AS (
select gs AS deet
FROM generate_series('2017-08-11'::date,'2017-08-16'::date, '1day'::interval)gs
)
SELECT cal.deet
, SUM(1) FILTER (WHERE l.active =True) OVER(ORDER BY l.created_at) AS cumsum
FROM cal
LEFT JOIN links l ON date_trunc('days', l.created_at)= cal.deet
ORDER BY created_at
;