时间戳变量的不规则分组

时间:2019-10-16 14:54:30

标签: sql postgresql

我有一个表,其组织如下:

id       lateAt
1231235  2019/09/14
1242123  2019/09/13
3465345  NULL
5676548  2019/09/28
8986475  2019/09/23

lateAt是某笔贷款的付款迟到的时间戳。因此,对于每个当前日期-我需要每天查看这些数字-有一定数量的条目,它们会延迟0-15、15-30、30-45、45-60、60-90和90+天。

这是我想要的输出:

lateGroup   Count
0-15        20
15-30       22
30-45       25
45-60       32
60-90       47
90+         57

我可以在R中轻松计算出这一点,但是要将结果返回到BI仪表板,我必须在数据库中创建一个新表,我认为这不是一个好习惯。什么是SQL原生方法来解决此问题?

4 个答案:

答案 0 :(得分:3)

我将使用range定义“后期组”,根据天数进行连接:

with groups (grp) as (
  values 
    (int4range(0,15, '[)')),
    (int4range(15,30, '[)')),
    (int4range(30,45, '[)')),
    (int4range(45,60, '[)')),
    (int4range(60,90, '[)')),
    (int4range(90,null, '[)'))
)
select grp, count(t.user_id)
from groups g
  left join the_table t on g.grp @> current_date - t.late_at
group by grp
order by grp;

int4range(0,15, '[)')创建一个范围,范围为0(包括)和15(不包括)

在线示例:https://rextester.com/QJSN89445

答案 1 :(得分:2)

您没有提到正在使用哪个DBMS,但是几乎所有的DBMS都具有这样的称为“值构造函数”的构造:

select bins.lateGroup, bins.minVal, bins.maxVal FROM
    (VALUES 
        ('0-15',0,15),
        ('15-30',15.0001,30),  -- increase by a small fraction so bins don't overlap
        ('30-45',30.0001,45),
        ('45-60',45.0001,60),
        ('60-90',60.0001,90),
        ('90-99999',90.0001,99999)
    ) AS bins(lateGroup,minVal,maxVal)

如果您的DBMS没有它,那么您可能可以使用UNION ALL

SELECT '0-15' as lateGroup, 0 as minVal, 15 as maxVal
union all SELECT '15-30',15,30
union all SELECT '30-45',30,45

然后,包含您提供的示例数据的完整查询将如下所示:

--- example from SQL Server 2012 SP1
--- first let's set up some sample data
create table #temp (id int, lateAt datetime);
INSERT #temp (id, lateAt) values
   (1231235,'2019-09-14'),
   (1242123,'2019-09-13'),
   (3465345,NULL),
   (5676548,'2019-09-28'),
   (8986475,'2019-09-23');

--- here's the actual query
select lateGroup, count(*) as Count
from #temp as T,
    (VALUES
        ('0-15',0,15),
        ('15-30',15.0001,30),  -- increase by a small fraction so bins don't overlap
        ('30-45',30.0001,45),
        ('45-60',45.0001,60),
        ('60-90',60.0001,90),
        ('90-99999',90.0001,99999)
    ) AS bins(lateGroup,minVal,maxVal)
    ) AS bins(lateGroup,minVal,maxVal)
where datediff(day,lateAt,getdate()) between minVal and maxVal
group by lateGroup
order by lateGroup

--- remove our sample data
drop table #temp;

这是输出:     lateGroup Count     15-30 2     30-45 2

注意:空lateAt为空的行不计算在内。

答案 2 :(得分:2)

在SQL中快速而肮脏的方法是:

SELECT '0-15'   AS lateGroup, 
       COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 0
  AND (CURRENT_DATE - t.lateAt) <  15

UNION

SELECT '15-30'  AS lateGroup, 
       COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 15
  AND (CURRENT_DATE - t.lateAt) <  30

UNION

SELECT '30-45'  AS lateGroup, 
       COUNT(*) AS lateGroupCount
FROM my_table t
WHERE (CURRENT_DATE - t.lateAt) >= 30
  AND (CURRENT_DATE - t.lateAt) <  45

-- Etc...

对于生产代码,您可能想做更多类似于罗斯的答案。

答案 3 :(得分:1)

我认为您可以在一个清晰的查询中完成所有操作:

with cte_lategroup as
(
    select *
    from (values(0,15,'0-15'),(15,30,'15-30'),(30,45,'30-45')) as t (mini, maxi, designation)
)
select 
    t2.designation
    , count(*)
from test t
    left outer join cte_lategroup t2
    on current_date - t.lateat >= t2.mini
    and current_date - lateat < t2.maxi
group by t2.designation;

具有与您相似的预设:

create table test
(
    id int
    , lateAt date
);

insert into test
values (1231235,  to_date('2019/09/14', 'yyyy/mm/dd'))
,(1242123,  to_date('2019/09/13', 'yyyy/mm/dd'))
,(3465345,  null)
,(5676548,  to_date('2019/09/28', 'yyyy/mm/dd'))
,(8986475,  to_date('2019/09/23', 'yyyy/mm/dd'));