按列和行,每天的计数和百分比进行分组

时间:2018-08-13 12:26:11

标签: sql postgresql group-by amazon-redshift

我有一个表,其中包含如下数据。

attr            |time         
----------------|--------------------------
abc             |2018-08-06 10:17:25.282546
def             |2018-08-06 10:17:25.325676
pqr             |2018-08-05 10:17:25.366823
abc             |2018-08-06 10:17:25.407941
def             |2018-08-05 10:17:25.449249

我想对它们进行分组并按attr列逐行计数,并在其中创建其他列以显示其每天的计数和百分比,如下所示。

attr            |day1_count| day1_%| day2_count| day2_%     
----------------|----------|-------|-----------|-------
abc             |2         |66.6%  | 0         | 0.0%
def             |1         |33.3%  | 1         | 50.0%
pqr             |0         |0.0%   | 1         | 50.0%

我可以使用分组依据显示一个计数,但是无法找出如何将它们分成多列。我尝试使用

生成day1百分比
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
    SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
    GROUP BY attr;

但这也不是给我正确的答案,我得到的百分比和计数均为1的所有零。我正在尝试使用遵循postgresql语法的Redshift来执行此操作。

4 个答案:

答案 0 :(得分:0)

在介绍以下内容之前,请先确定其逻辑:

with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday

如果需要,您可以从这里开始每天创建

答案 1 :(得分:0)

如果您需要7天,那么我会尝试增强查询@johnHC btw,那么如果

with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM  t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
) 
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2 
from  CTE3 group by CTE3.attr

http://sqlfiddle.com/#!17/54ace/20

答案 2 :(得分:0)

如果您只有两天的时间:

http://sqlfiddle.com/#!17/3bdad/3(天数,如您的示例所示,从左到右)

http://sqlfiddle.com/#!17/3bdad/5(天数递增)

其他答案中已经提到了主要思想。我不是使用CTE来计算值,而是使用窗口函数,该函数要短一些并且可读性更好。枢轴以相同的方式完成。

 SELECT 
    attr, 
    COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count,         -- D
    COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent, 
    COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
    COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
    /*
       Add more days here
    */
FROM(
    SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent             -- C
    FROM (
        SELECT DISTINCT
            attr, 
            MAX(time::date) OVER () - time::date as day_number,                  -- B
            count(*) OVER (partition by time::date, attr) as count,              -- A
            count(*) OVER (partition by time::date) as count_per_day 
        FROM test_table
    )s
)s

GROUP BY attr
ORDER BY attr

A计算每天的行数并计算每天的行数AND attr

B为了提高可读性,我将日期转换为数字。在这里,我计算了行的当前日期与表中可用的最大日期之间的差。所以我得到一个从0(第一天)到n-1(最后一天)的计数器

C计算百分比并四舍五入

通过过滤日期数字来

D枢纽。 COALESCE避免使用NULL值并将其切换为0。要增加天数,可以将这些列相乘。

编辑:使日间计数器更具弹性,持续更多天;新的SQL小提琴

答案 3 :(得分:0)

基本上,我将其视为条件聚合。但是,您需要为该数据透视表获取一个枚举数。所以:

SELECT attr, 
       COUNT(*) FILTER (WHERE day_number = 1) as day1_count, 
       COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent, 
       COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
       COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
             DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
             1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
      FROM test_table
     ) s
GROUP BY attr, cnt
ORDER BY attr;

Here是一个SQL提琴。