如何从PostgreSQL中的另一个表生成日期范围+计算更早的日期?

时间:2017-09-13 21:54:42

标签: sql postgresql date count range

我有下表:

links

created_at           active 
2017-08-12 15:46:01  false
2017-08-13 15:46:01  true
2017-08-14 15:46:01  true
2017-08-15 15:46:01  false

当给定日期范围时,我必须提取时间序列,告诉我在等于或小于当前(滚动)日期的日期创建了多少活动链接。

输出(日期范围2017-08-12 - 2017-08-17):

day          count
2017-08-12   0 (there are 0 active links created on 2017-08-12 and earlier)
2017-08-13   1 (there is 1 active link created on 2017-08-13 and earlier)
2017-08-14   2 (there are 2 active links created on 2017-08-14 and earlier)
2017-08-15   2 ...
2017-08-16   2
2017-08-17   2

我想出了以下关于生成日期的查询:

SELECT date_trunc('day', dd):: date
FROM generate_series
    ( '2017-08-12'::timestamp 
    , '2017-08-17'::timestamp
    , '1 day'::interval) dd

但滚动计数令我感到困惑,我不确定如何继续。这可以通过窗口函数来解决吗?

6 个答案:

答案 0 :(得分:1)

我会使用汇总和累计金额 - 假设您每天至少有一次:

select date_trunc('day', created_at)::date as created_date,
       sum(active::int) as actives,
       sum(sum(active::int)) over (date_trunc('day', created_at)) as running_actives
from t
group by created_date;

如果数据中有漏洞,则只需生成日期。但是,如果你这样做,我会建议包括where active - 你现在可以加入它,我只是想确保没有漏洞。

答案 1 :(得分:1)

<强>演示

http://rextester.com/OGZV44492

<强> SQL

SELECT date_trunc('day', dd):: date AS day,
       (SELECT COUNT(*) FROM links
        WHERE active = true
          AND date(created_at) <= date_trunc('day', dd)) AS "count"
FROM generate_series
    ( '2017-08-12'::timestamp 
    , '2017-08-17'::timestamp
    , '1 day'::interval) dd

<强>解释

上面的SQL执行了一个简单的子选择来计算links表中的行数,其中日期部分小于或等于生成范围中的每个日期。

答案 2 :(得分:1)

这应该是最快的:

SELECT day::date
     , sum(ct) OVER (ORDER BY day) AS count
FROM   generate_series (timestamp '2017-08-12'
                      , timestamp '2017-08-17'
                      , interval  '1 day') day
LEFT   JOIN  (
   SELECT date_trunc('day', created_at) AS day, count(*) AS ct
   FROM   tbl
   WHERE  active -- fastest
   GROUP  BY 1
   ) t USING (day)
ORDER  BY 1;

dbfiddle here

count()仅计算非空行,因此您可以使用count(active OR NULL)。但计数的最快选择是排除带有WHERE子句的不相关行。由于我们无论如何都会添加generate_series()的所有日期,因此这是最佳选择。

比较

由于generate_series()返回timestamp(不是date),我使用date_trunc()来获取匹配的时间戳(非常快一点)。

答案 3 :(得分:0)

我认为像这样的查询可以帮到你:

;with t as (SELECT date_trunc('day', dd):: date
FROM generate_series
    ( '2017-08-12'::timestamp 
    , '2017-08-17'::timestamp
    , '1 day'::interval) dd
)
select distinct t.date_trunc
  , count(case when links.active = 'true' then 1 end) over (order by links.created_at) count
from t
left join links
on t.date_trunc = cast(links.created_at as date)
order by t.date_trunc;

<强> SQL Fiddle Demo

答案 4 :(得分:0)

如果您在表中缺少了几天,则需要使用generate_series()来创建它们。由于这基本上是将前两个答案放在一起,因此给予了信用;;)

但是,这个连接最好在GROUP BY之后完成,它只会返回一行而不是之前的一行,这会导致更大的JOIN。

WITH dailydata AS (
  SELECT 
    d::DATE, COALESCE(n,0) n
  FROM
    generate_series( 
      '2000-01-01'::DATE, 
      '2000-10-01'::DATE,
      '1 DAY'::INTERVAL ) d
  LEFT JOIN
    (SELECT created_at::DATE d, count(*) AS n
    FROM links WHERE active
    GROUP BY d) data
    USING (d)
)
SELECT d, n, sum(n) OVER (ORDER BY d) FROM dailydata;

答案 5 :(得分:0)

CREATE TABLE links
        ( created_at           timestamp
        , active boolean
        );
INSERT INTO links(created_at,active)VALUES
 ('2017-08-12 15:46:01', false)
,('2017-08-13 15:46:01', true)
,('2017-08-14 15:46:01', true)
,('2017-08-15 15:46:01', false)
        ;
WITH cal AS (
        select gs AS deet
        FROM generate_series('2017-08-11'::date,'2017-08-16'::date, '1day'::interval)gs
        )
SELECT cal.deet
        , SUM(1) FILTER (WHERE l.active =True) OVER(ORDER BY l.created_at) AS cumsum
FROM cal
LEFT JOIN links l ON date_trunc('days', l.created_at)= cal.deet
ORDER BY created_at
        ;