在SQL中重复使用多个日期的值

时间:2018-10-15 13:17:41

标签: sql amazon-athena presto

我有一张看起来像这样的桌子

"dependencies": {
    "react": "^16.5.2",
    "react-dom": "^16.5.2"
  },
  "devDependencies": {
    "babel": "^6.23.0",
    "babel-core": "^6.26.3",
    "babel-loader": "^8.0.4",
    "babel-preset-env": "^1.7.0",
    "babel-preset-react": "^6.24.1",
    "html-webpack-plugin": "^3.2.0",
    "http-server": "^0.11.1",
    "webpack-cli": "^3.1.2",
    "webpack-dev-server": "^3.1.9"
  }

它显示客户是否更改了帐户类型以及何时更改。但是,我想要一个可以给我以下输出的查询

ID            Type               Change_Date               
1              t1                2015-10-08
1              t2                2016-01-03
1              t3                2016-03-07
2              t1                2017-12-13
2              t2                2018-02-01
对于每个ID Type Change_Date 1 t1 2015-10 1 t1 2015-11 1 t1 2015-12 1 t2 2016-01 1 t2 2016-02 1 t3 2016-03 1 t3 2016-04 ... ... ... 1 t3 2018-10

。输出显示了客户在当月之前每个月拥有的帐户类型。我的问题是填写“空”月份。在某些情况下,两次帐户更改之间的间隔可能会超过一年。

我希望这是有道理的。

先谢谢了。

1 个答案:

答案 0 :(得分:2)

基于Presto SQL(因为您的起源问题与Presto / SQL有关)


2018年11月1日更新:使用lead()简化SQL


准备数据

mytable与您的表相同

id  type  update_date
1   t1    2015-10-08
1   t2    2016-01-03
1   t3    2016-03-07
2   t1    2017-12-13
2   t2    2018-02-01

t_month是一个字典表,其中包含从2015-012019-12的所有月份数据。这种字典表很有用。

ym
2015-01
2015-02
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
2015-09
...
2019-12

增加mytable的寿命

通常,您应该像生命周期一样“管理”数据。因此mytable应该

id  type   start_date      end_date
1   t1     2015-10-08      2016-01-03
1   t2     2016-01-03      2016-03-07
1   t3     2016-03-07      null
2   t1     2017-12-13      2018-02-01
2   t2     2018-02-01      null

但是在这种情况下,您不需要。因此,下一步就是“创建”一个。使用lead()窗口功能。

select 
    id,
    type, 
    date_format(update_date, '%Y-%m') as start_month,
    lead(
        date_format(update_date, '%Y-%m'), 
        1, -- next one
        date_format(current_date+interval '1' month, '%Y-%m') -- if null return next month
    ) over(partition by id order by update_date) as end_month
from mytable

输出

id  type  start_month  end_month
1   t1    2015-10     2016-01
1   t2    2016-01     2016-03
1   t3    2016-03     2018-11
2   t1    2017-12     2018-02
2   t2    2018-02     2018-11

交叉联接idmonth

很简单

with id_month as (
    select * from t_month 
    cross join (select distinct id from mytable)
)
select * from id_month

输出

ym      id
2015-01 1
2015-02 1
2015-03 1
...
2019-12 1
2015-01 2
2015-02 2
2015-03 2
...
2019-12 2

最后

现在,您可以在subquery子句中使用select

select 
    id,
    type,
    ym
from (
    select
        t1.id,
        t1.ym,
        (select type from mytable2 where t1.id = id and t1.ym >= start_month and t1.ym < end_month) as type
    from id_month t1
)
where type is not null
-- order by id, ym

完整SQL

with mytable2 as (
    select 
        id,
        type, 
        date_format(update_date, '%Y-%m') as start_month,
        lead(
            date_format(update_date, '%Y-%m'), 
            1, -- next one
            date_format(current_date+interval '1' month, '%Y-%m') -- if null return next month
        ) over(partition by id order by update_date) as end_month
    from mytable
)
, id_month as (
    select * from t_month 
    cross join (select distinct id from mytable)
)
select 
    id,
    type,
    ym
from (
    select
        t1.id,
        t1.ym,
        (select type from mytable2 where t1.id = id and t1.ym >= start_month and t1.ym < end_month) as type
    from id_month t1
)
where type is not null
--order by id, ym

输出

id  type  ym
1   t1    2015-10
1   t1    2015-11
1   t1    2015-12
1   t2    2016-01
1   t2    2016-02
1   t3    2016-03
1   t3    2016-04
...
1   t3    2018-10
2   t1    2017-12
2   t1    2018-01
2   t2    2018-02
...
2   t2    2018-10