我有以下查询在SQL SERVER中正常工作 但我试图在HADOOP / HIVEQL中使用它。
然而,HIVEQL没有Top 1功能或外部适用。 任何人都可以提供替代方案,但结果相同吗?
下面是数据(表格),下面是我想要实现的结果。
谢谢, Danii
CREATE TABLE #temp
(
ID varchar(20)NOT NULL
,CreatedDate DATETIME NOT NULL
,CompletedDate DATETIME NOT NULL
,TYPES varchar(20) NOT NULL
,STATUSS varchar(20) NOT NULL
);
INSERT INTO #temp
VALUES ('61030203647','20160427','20160427','Re-Activattion', 'COMP');
INSERT INTO #temp
VALUES('61030203647','20160425','20160426','Re-Activattion', 'N-CO');
INSERT INTO #temp
VALUES('61030203647','20160422','20160422','Re-Activattion', 'N-CO');
INSERT INTO #temp
VALUES('61030203647','20170311','20170613','Re-Activattion', 'COMP');
INSERT INTO #temp
VALUES('64074558792','20160731','20160805','Re-Activattion','N-CO');
INSERT INTO #temp
VALUES('64074558792','20160801','20160805','Re-Activattion','PARTIALLY');
INSERT INTO #temp
VALUES('64074558792','20160809','20160809','Re-Activattion','PARTIALLY');
INSERT INTO #temp
VALUES('64074558792','20160810','20160810','Re-Activattion','N-CO');
INSERT INTO #temp
VALUES('64074558792','20160810','20160810','Re-Activattion','N-CO');
INSERT INTO #temp
VALUES('64074558792','20160811','20160811','Re-Activattion','COMP');
INSERT INTO #temp
VALUES('64074558792','20160812','20160814','Re-Activattion','N-CO');
;WITH src AS (
SELECT ID, CreatedDate, CompletedDate, TYPES, STATUSS,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate , CompletedDate) AS rn
FROM #temp
)
,grouped as (
Select s.* ,d.rnGrp from src s
outer apply (select top 1 rn rnGrp from src s2
where s.ID=s2.ID and s2.STATUSS='COMP' and s2.rn>=s.rn ) d(rnGrp))
,grouped1 as (
Select ID, min(CreatedDate) CreatedDate, max(CompletedDate) CompletedDate
,rnGrp,
Case when SUM(CASE WHEN STATUSS = 'COMP' THEN 1 ELSE 0 END) >0 then
Case when TYPES='De-Activattion' then 'NOT A RE-ACT' else
CAST( DATEDIFF(day,min(CreatedDate) ,max(CompletedDate) ) AS VARCHAR(25))
END
ELSE 'NOT COMPLETED' END AS ACT_COMPLETION_TIME
,Sum(CASE WHEN STATUSS = 'N-CO' THEN 1 ELSE 0 END) as [ACT NCO #]
From grouped
Group by ID, rnGrp,TYPES
)
,grouped2 as (
select ID, CreatedDate, CompletedDate, ACT_COMPLETION_TIME, [ACT NCO #]
,Count(*) Over(Partition by ID) cnt
,row_number()Over(Partition by ID Order by CreatedDate) rn
from grouped1
)
Select g2.ID,
Stuff(Convert(varchar(11),g2.CreatedDate,100),4,4,'-') as MIN_CREATED_MONTH_YEAR
,g2.ACT_COMPLETION_TIME, g2.[ACT NCO #]
from grouped2 g2
left join grouped2 g3 on g2.ID=g3.ID and g2.rn=g3.rn+1
WANT
CREATE TABLE #temp2
(
ID varchar(20) NOT NULL
,MIN_CREATED_MONTH_YEAR varchar(20)
,ACT_COMPLETION_TIME varchar(20)
,ACT_NCO varchar(20)
);
INSERT INTO #temp2
VALUES ('61030203647','Apr-2016','5','2');
INSERT INTO #temp2
VALUES ('61030203647','Mar-2017','94','0');
INSERT INTO #temp2
VALUES ('64074558792','Jul-2016','11','3');
SELECT *
FROM #temp2
我想在2周内为ID分组添加2周的容忍度。 附加的查询做了它需要的东西,我知道如何在最后添加2周容差(我没有包括在这里,不是为了复杂的事情)
我的解释如下,基本上所有我真正需要的是外部应用和用Hiveql编写的前1,因为这些函数不存在。
说明: 61030203647 Re-Act请求创建于2016年,然后是2017年(过去2周,因此可以将其视为2个单独的问题(应该是5天(创建日期)到第27个(最长完成日期))
然而64074558792,在11.08完成并在31.07创建,然后在12上创建了另一个Re-Act。这可能是一个错误,这是在2周的容忍范围内,如果它在完成日期的2周内对待它们同样的问题(不像上面的例子,过去2周,并被分为2个请求。)