在start_date和end_date之间添加填充天数的列

时间:2017-08-14 20:26:32

标签: sql hadoop impala

我有这种结构的输入表:

acct_id pvt_data_id pvt_pref_ind start_dttm end_dttm load_dttm pr_load_time
4174878 26  Y   20101126144142  99991231235959  20170527000000  2017052700
4174878 26  Y   20101126144142  99991231235959  20170528000000  2017052800
4174878 26  Y   20101126144142  99991231235959  20170530000000  2017053000
3212472 26  X   20131016144142  99991231235959  20170531000000  2017053100
4174878 26  Y   20101126144142  99991231235959  20170601000000  2017060100
3212472 26  X   20091201142148  99991231235959  20170602000000  2017060200

我应该拿这个表并创建一个带有附加列pr_day的新表,它将在20170814start_dttm之间的范围内具有一天的整数值(例如end_dttmselect acct_id, pvt_data_id, pvt_pref_ind, cast(min(substr(cast(start_dttm as string),1,8)) as bigint), max(case when end_dttm=99991231235959 then cast(from_unixtime(unix_timestamp(now()),'yyyyMMdd') as bigint) when end_dttm is null then cast(from_unixtime(unix_timestamp(now()),'yyyyMMdd') as bigint) else end_dttm end ) from table1 group by acct_id, pvt_data_id,pvt_pref_ind ,因此该范围内的每一天都会有一行 我开始使用以下查询来获取每个组的范围(包括前3列)

acct_id pvt_data_id pvt_pref_ind start_dttm end_dttm load_dttm pr_load_time pr_day
4174878 26  Y   20101126144142  99991231235959  20170527000000  2017052700 20101126
4174878 26  Y   20101126144142  99991231235959  20170528000000  2017052800 20101127
4174878 26  Y   20101126144142  99991231235959  20170529000000  2017052900 20101128
4174878 26  Y   20101126144142  99991231235959  20170530000000  2017053000 20101129
3212472 26  X   20131016144142  99991231235959  20170531000000  2017053100 20091202
4174878 26  Y   20101126144142  99991231235959  20170601000000  2017060100 20101130
3212472 26  X   20091201142148  99991231235959  20170602000000  2017060200 20091201¨

注意:值99991231235959或null表示当前日期应用作end_date 现在我不知道如何继续,搜索引导我做一个交叉加入来填补日期,但我该怎么加入表?

期望的输出是这样的:

<input/>

感谢您的提示和帮助。

1 个答案:

答案 0 :(得分:0)

我认为最简单的方法是创建一个列出所有日期的工作表,然后将原始表连接到该工作表。在堆栈溢出文章here中讨论了获取两个日期之间的日期列表,我们只需稍微调整一下即可使其适用于您。为了答案,我假设你打算创建一个新表,所以我将把结果选择到临时表中,你可以将其改为输出表:

DECLARE @Date1 DATE, @Date2 DATE
SET @Date1 = (select left(min(start_dttm), 8) from Your_Table) --'20150528'
SET @Date2 = getdate()

SELECT CONVERT(VARCHAR(8), DATEADD(DAY,number,@Date1), 112) + '000000' as [LongDate]
, CONVERT(VARCHAR(8), DATEADD(DAY,number,@Date1), 112) [Shortdate]
FROM master..spt_values
WHERE type = 'P'
AND DATEADD(DAY,number,@Date1) <= @Date2

然后,您只需将一个新的日期范围表与原始表一起加入一个具有长日期值的from子句,并使用短日期值填充该列,如:

#daterange.longdate between YourTable.start_dttm and YourTable.end_dttm

希望有所帮助,