通过确定季度来划分基于季度的ID和计数为1或0的ID

时间:2018-02-11 15:13:13

标签: sql hive hiveql

我们有两列Idmonth Id

我正在寻找的输出是根据季度粒度划分年份ID。活动列应来自季度。如果id为活动,则活动应为1,否则为0.如果id出现在第1季度的任何一个(例如:仅1),则活动仍为1.

像这样:

id           month_dt
-----------------------------------
1000000000   2012-03-01 00:00:00.0
1000000000   2015-09-01 00:00:00.0
1000000000   2016-10-01 00:00:00.0
1000000000   2015-11-01 00:00:00.0
1000000000   2014-01-01 00:00:00.0
1000000000   2013-04-01 00:00:00.0
1000000000   2014-12-01 00:00:00.0
1000000000   2015-02-01 00:00:00.0
1000000000   2014-06-01 00:00:00.0
1000000000   2013-01-01 00:00:00.0
1000000000   2014-05-01 00:00:00.0
1000000000   2016-05-01 00:00:00.0
1000000000   2013-07-01 00:00:00.0

预期结果:

ID           YEAR     QTR      ACTIVITY (1 or 0)
--------------------------------------------------
1000000000   2012      1          1
1000000000   2012      2          0
1000000000   2012      3          0
1000000000   2012      4          0
1000000000   2013      1          1
1000000000   2013      2          1
1000000000   2013      3          1
1000000000   2013      4          0

下面是我试过的那个,但它没有返回预期的结果。请帮我实现这个目标

SELECT
    a.id, a.year,
    SUM(CASE WHEN quarter BETWEEN 1 AND 3 THEN 1 ELSE 0 END) AS Q1,
    SUM(CASE WHEN quarter BETWEEN 4 AND 6 THEN 1 ELSE 0 END) AS Q2,
    SUM(CASE WHEN quarter BETWEEN 7 AND 9 THEN 1 ELSE 0 END) AS Q3,
    SUM(CASE WHEN quarter BETWEEN 10 AND 12 THEN 1 ELSE 0 END) AS Q4
FROM
    (SELECT
         id, 
         TRIM(SUBSTRING(month_id, 1, 4)) AS year,
         TRIM(regexp_replace(SUBSTR(month_id, 5, 4), "-", "")) as quarter    
     FROM
         test.patientid) a
GROUP BY 
    a.id, a.year

2 个答案:

答案 0 :(得分:1)

我认为你正在寻找这样的东西:

select y.yyyy, q.q,
       (case when count(t.month_dt) > 0 then 1 else 0 end) as activity_flag
from (select distinct year(month_dt) as yyyy from t) y cross join
     (select distinct quarter(month_dt) as q from t) q left join
     t
     on year(t.month_dt) = y.yyyy and quarter(t.month_dt) = q.q
group by y.yyyy, q.q;

这假设一年中每个季度至少有一项活动(无论季度如何)。否则,您只需要输入1,2,3和4的列表即可获得季度。

答案 1 :(得分:0)

@Babu;如果你的hive版本中不存在函数季度,我有一个替代函数来获取给定日期的季度。希望这可以帮助。谢谢!

create table qtrs(qtr int);
insert into qtrs  values (1),(2),(3),(4);

create table ims 
(id int,
month_dt date
);

insert into ims values
(100, '2012-03-01'), 
(100, '2013-04-01'), 
(100, '2013-01-01'), 
(100, '2013-07-01'), 
(100, '2014-01-01'), 
(100, '2014-05-01'), 
(100, '2014-06-01'), 
(100, '2014-12-01'), 
(100, '2015-02-01'), 
(100, '2015-09-01'), 
(100, '2015-11-01'), 
(100, '2016-05-01'), 
(100, '2016-10-01'); 

insert into ims values
(200, '2012-03-01'), 
(200, '2013-04-01'); 

Query:
select DISTINCT NVL(ims.id, qtr.id) as id,qtr.year as year,qtr.qtr as qtr,
IF(ims.id is null, 0, 1) as activity  
from jbacoy.ims ims
right join (select distinct ims.id,YEAR(ims.month_dt) as year,qtrs.qtr from jbacoy.ims ims join jbacoy.qtrs qtrs) qtr 
on (ims.id=qtr.id and year(ims.month_dt)=qtr.year and int((month(month_dt)-1)/3)+1=qtr.qtr)
sort by id, year, qtr;

Result:
id      year    qtr     activity
100     2012    1       1
100     2012    2       0
100     2012    3       0
100     2012    4       0
100     2013    1       1
100     2013    2       1
100     2013    3       1
100     2013    4       0
100     2014    1       1
100     2014    2       1
100     2014    3       0
100     2014    4       1
100     2015    1       1
100     2015    2       0
100     2015    3       1
100     2015    4       1
100     2016    1       0
100     2016    2       1
100     2016    3       0
100     2016    4       1
200     2012    1       1
200     2012    2       0
200     2012    3       0
200     2012    4       0
200     2013    1       0
200     2013    2       1
200     2013    3       0
200     2013    4       0