我有一个表,其中包含如下数据。
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
我想对它们进行分组并按attr列逐行计数,并在其中创建其他列以显示其每天的计数和百分比,如下所示。
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
我可以使用分组依据显示一个计数,但是无法找出如何将它们分成多列。我尝试使用
生成day1百分比SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
但这也不是给我正确的答案,我得到的百分比和计数均为1的所有零。我正在尝试使用遵循postgresql语法的Redshift来执行此操作。
答案 0 :(得分:0)
在介绍以下内容之前,请先确定其逻辑:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
如果需要,您可以从这里开始每天创建
答案 1 :(得分:0)
如果您需要7天,那么我会尝试增强查询@johnHC btw,那么如果
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
答案 2 :(得分:0)
如果您只有两天的时间:
http://sqlfiddle.com/#!17/3bdad/3(天数,如您的示例所示,从左到右)
http://sqlfiddle.com/#!17/3bdad/5(天数递增)
其他答案中已经提到了主要思想。我不是使用CTE来计算值,而是使用窗口函数,该函数要短一些并且可读性更好。枢轴以相同的方式完成。
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A计算每天的行数并计算每天的行数AND attr
B为了提高可读性,我将日期转换为数字。在这里,我计算了行的当前日期与表中可用的最大日期之间的差。所以我得到一个从0(第一天)到n-1(最后一天)的计数器
C计算百分比并四舍五入
通过过滤日期数字来 D枢纽。 COALESCE
避免使用NULL
值并将其切换为0。要增加天数,可以将这些列相乘。
编辑:使日间计数器更具弹性,持续更多天;新的SQL小提琴
答案 3 :(得分:0)
基本上,我将其视为条件聚合。但是,您需要为该数据透视表获取一个枚举数。所以:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here是一个SQL提琴。