SQL:按条目数和输入日期分组

时间:2014-07-21 12:30:33

标签: sql postgresql

我有下表log

event_time       | name |
-------------------------
2014-07-16 11:40    Bob
2014-07-16 10:00   John
2014-07-16 09:20    Bob
2014-07-16 08:20    Bob

2014-07-15 11:20    Bob
2014-07-15 10:20   John
2014-07-15 09:00    Bob

我想生成一个报告,我可以按照每天的条目数和入学日对数据进行分组。因此,上表的结果报告将是这样的:

event_date   | 0-2 | 3 | 4-99 |
-------------------------------
2014-07-16      1    1      0
2014-07-15      2    0      0

我使用以下方法来解决它:

如果我在任何人发布之前找到答案,我会分享。

我想为每个name计算一些日常条目。然后我检查这个值属于哪个列,并且我将1添加到该列。

4 个答案:

答案 0 :(得分:2)

我分两步走了。内部查询获取基本计数。外部查询使用case语句对计数求和。

SQL Fiddle Example

select event_date,
  sum(case when cnt between 0 and 2 then 1 else 0 end) as "0-2",
  sum(case when cnt = 3 then 1 else 0 end) as "3",
  sum(case when cnt between 4 and 99 then 1 else 0 end) as "4-99"
from 
    (select cast(event_time as date) as event_date, 
      name,
      count(1) as cnt
    from log
    group by cast(event_time as date), name) baseCnt
group by event_date
order by event_date

答案 1 :(得分:1)

试试这个

 select da,sum(case when c<3 then 1 else 0 end) as "0-2",
sum(case when c=3 then 1 else 0 end) as "3",
sum(case when c>3 then 1 else 0 end) as "4-66" from (
select cast(event_time as date) as da,count(*) as c from 
table1 group by cast(event_time  as date),name) as aa group by da 

答案 2 :(得分:1)

首先通过两个步骤进行聚合:

SELECT day, CASE
               WHEN ct < 3 THEN '0-2'
               WHEN ct > 3 THEN '4_or_more'
               ELSE '3'
            END AS cat
      ,count(*)::int AS val
FROM  (
   SELECT event_time::date AS day, count(*) AS ct
   FROM   tbl
   GROUP  BY 1
   ) sub
GROUP  BY 1,2
ORDER  BY 1,2;

根据您的描述,名称应完全不相关 然后接受查询并通过crosstab()

运行
SELECT *
FROM   crosstab(
   $$SELECT day, CASE
                   WHEN ct < 3 THEN '0-2'
                   WHEN ct > 3 THEN '4_or_more'
                   ELSE '3'
                 END AS cat
           ,count(*)::int AS val
   FROM  (
      SELECT event_time::date AS day, count(*) AS ct
      FROM   tbl
      GROUP  BY 1
      ) sub
   GROUP BY 1,2
   ORDER BY 1,2$$

   ,$$VALUES ('0-2'::text), ('3'), ('4_or_more')$$
   ) AS f (day date, "0-2" int, "3" int, "4_or_more" int);

crosstab()由附加模块tablefunc提供。此相关答案中的详细信息和说明:
PostgreSQL Crosstab Query

答案 3 :(得分:1)

这是PIVOT查询的变体(尽管PostgreSQL通过crosstab(...) table functions支持这一点)。现有的答案涵盖了基本技术,我更愿意在不使用CASE的情况下构建查询。

要开始使用,我们需要做一些事情。第一个基本上是一个Calendar Table,或者来自一个的条目(如果你还没有,它们是最有用的维度表)。如果您没有,则可以轻松生成指定日期的条目:

WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
                        FROM GENERATE_SERIES(CAST('2014-07-01' AS DATE),
                                             CAST('2014-08-01' AS DATE),
                                             INTERVAL '1 DAY') AS dr(startOfDay))

SQL Fiddle Demo

这主要用于创建双聚合的第一步,如下所示:

SELECT Calendar_Range.startOfDay, COUNT(Log.name)
FROM Calendar_Range
LEFT JOIN Log
       ON Log.event_time >= Calendar_Range.startOfDay
          AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name

SQL Fiddle Demo

请记住,大多数具有可空表达式的聚合列(此处为COUNT(Log.name))将忽略 null值(不计算它们)。这也是包含SELECT列表中的分组列的少数几次之一(通常会使结果模糊不清)。对于实际的查询,我将它放入子查询中,但它也可以用作CTE。

我们还需要一种方法来构建我们的COUNT范围。这也很简单:

     Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
                     FROM (VALUES('0 - 2', 0),
                                 ('3', 3),
                                 ('4+', 4)) e(text, start))

SQL Fiddle Demo

我们也会将这些视为“独家上限”。

我们现在拥有进行查询所需的所有部分。我们实际上可以使用这些虚拟表在当前答案的两个静脉中进行查询。


首先,SUM(CASE...)风格 对于此查询,我们将再次利用聚合函数的零忽略质量:

WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
                        FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
                                             CAST('2014-07-17' AS DATE),
                                             INTERVAL '1 DAY') AS dr(startOfDay)),
     Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
                     FROM (VALUES('0 - 2', 0),
                                 ('3', 3),
                                 ('4+', 4)) e(text, start))
SELECT startOfDay, 
       COUNT(Zero_To_Two.text) AS Zero_To_Two, 
       COUNT(Three.text) AS Three, 
       COUNT(Four_And_Up.text) AS Four_And_Up
FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
      FROM Calendar_Range
      LEFT JOIN Log
             ON Log.event_time >= Calendar_Range.startOfDay
                AND Log.event_time < Calendar_Range.nextDay
      GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
LEFT JOIN Count_Range Zero_To_Two
       ON Zero_To_Two.text = '0 - 2'
          AND Entry_Count.count >= Zero_To_Two.start 
          AND Entry_Count.count < Zero_To_Two.next 
LEFT JOIN Count_Range Three
       ON Three.text = '3'
          AND Entry_Count.count >= Three.start 
          AND Entry_Count.count < Three.next 
LEFT JOIN Count_Range Four_And_Up
       ON Four_And_Up.text = '4+'
          AND Entry_Count.count >= Four_And_Up.start
GROUP BY startOfDay
ORDER BY startOfDay

SQL Fiddle Example


另一个选项当然是crosstab查询,其中CASE用于细分结果。我们将使用Count_Range表来解码我们的值:

SELECT startOfDay, "0 -2", "3", "4+"
FROM CROSSTAB($$WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
                                        FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
                                                             CAST('2014-07-17' AS DATE),
                                                             INTERVAL '1 DAY') AS dr(startOfDay)),
                     Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
                                     FROM (VALUES('0 - 2', 0),
                                                 ('3', 3),
                                                 ('4+', 4)) e(text, start))
                SELECT Calendar_Range.startOfDay, Count_Range.text, COUNT(*) AS count
                FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
                      FROM Calendar_Range
                      LEFT JOIN Log
                             ON Log.event_time >= Calendar_Range.startOfDay
                                AND Log.event_time < Calendar_Range.nextDay
                      GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
                JOIN Count_Range
                  ON Entry_Count.count >= Count_Range.start
                     AND (Entry_Count.count < Count_Range.end OR Count_Range.end IS NULL)
                GROUP BY Calendar_Range.startOfDay, Count_Range.text
                ORDER BY Calendar_Range.startOfDay, Count_Range.text$$,
              $$VALUES('0 - 2', '3', '4+')$$) Data(startOfDay DATE, "0 - 2" INT, "3" INT, "4+" INT)

(我相信这是正确的,但没有办法测试它 - Fiddle似乎没有加载交叉表功能。特别是,CTE可能必须进入函数内部本身,但我不确定......)