查询来划分数据

时间:2018-02-12 01:37:49

标签: mysql sql hive hiveql impala

我们有两列id和monthid。

我正在寻找的输出是根据季度将月份与月份ID分开。输出列应该是四分之一。如果id为活动,则输出应为1,否则为0.如果id出现在第一季度中的任何一个(例如:仅为1),则输出仍为1.

像这样:

id           month
-----------------------------------
100   2012-03-01 00:00:00.0
100   2015-09-01 00:00:00.0
100   2016-10-01 00:00:00.0
100   2015-11-01 00:00:00.0
100   2014-01-01 00:00:00.0
100   2013-04-01 00:00:00.0
100   2014-12-01 00:00:00.0
100   2015-02-01 00:00:00.0
100   2014-06-01 00:00:00.0
100   2013-01-01 00:00:00.0
100   2014-05-01 00:00:00.0
100   2016-05-01 00:00:00.0
100   2013-07-01 00:00:00.0

结果应该是

ID    YEAR     QTR      output (1 or 0)
--------------------------------------------------
100   2012      1          1
100   2012      2          0
100   2012      3          0
100   2012      4          0
100   2013      1          1
100   2013      2          1
100   2013      3          1
100   2013      4          0

下面是我试过的那个,但它没有返回预期的结果。请帮我实现这个目标。我想在输出为0的时候。

select a.id,a.year,a.month,
CASE WHEN a.month BETWEEN 1 AND 4 THEN 1 
 ELSE 0 END as output
from
(select id,trim(substring(claim_month_id,1,4)) as year,(INT((MONTH(monthid)-1)/3)+1) as month from test) a
group by a.id,a.year,a.month

任何帮助都将不胜感激。

1 个答案:

答案 0 :(得分:0)

@Ani;在Hive中没有分层查询来创建四个季度(1,2,3,4),所以我为它创建了一个小表。然后我得到ims_patient_activity_diagnosis表中存在的所有patient_id,年和月。最后,我对所有可能的病人身份,年份和季度进行了正确的加入(1,2,3,4);如果右连接中不存在id或年份或季度,则该id,年份和季度没有活动。我为这些行分配activity = 0。 我还插入患者id = 200来测试表中是否有更多患者ID。希望这可以帮助。感谢。

create table dbo.qtrs(month int);
insert into qtrs  values (1),(2),(3),(4);

select DISTINCT NVL(ims.id, qtr.id) as patient_id,
qtr.year as year,
qtr.month as month,
CASE WHEN ims.id > 0 THEN 1 ELSE 0 END as activity  
from sandbox_grwi.ims_patient_activity_diagnosis ims
right join (select distinct ims.id,YEAR(ims.month_dt) as year,qtrs.month from sandbox_grwi.ims_patient_activity_diagnosis ims join dbo.qtrs qtrs) qtr 
on (ims.id=qtr.id and YEAR(ims.month_dt)=qtr.year and INT((MONTH(month_dt)-1)/3)+1=qtr.month)
sort by patient_id, year, month;

Sample Result:
p_id    year    month   activity
100     2012    1       1
100     2012    2       0
100     2012    3       0
100     2012    4       0
100     2013    1       1
100     2013    2       1
100     2013    3       1
100     2013    4       0
100     2014    1       1
100     2014    2       1
100     2014    3       0
100     2014    4       1
100     2015    1       1
100     2015    2       0
100     2015    3       1
100     2015    4       1
100     2016    1       0
100     2016    2       1
100     2016    3       0
100     2016    4       1
200     2012    1       1
200     2012    2       0
200     2012    3       0
200     2012    4       0
200     2013    1       0
200     2013    2       1
200     2013    3       0
200     2013    4       0


additional sample data:
insert into sandbox_grwi.ims_patient_activity_diagnosis values
(200, '2012-03-01'), 
(200, '2013-04-01');