使用HQL计算两个字段之间的差异范围

时间:2019-09-05 09:41:24

标签: sql date datetime hive hiveql

我需要你的帮忙。我在Hive中使用hql查询来加载表。有人有加载表的想法吗?

表用户

START_TIME_DATE |    END_TIME_DATE |  USER   | START_DAY_ID  | END_DAY_ID  
    (String)             (String)    (Bigint)     (Int)           (Int)
210241              231236              1         01092019      01092019
234736              235251              2         01092019      01092019
223408              021345              3         01092019      02092019

START_TIME_DATE和END__TIME_DATE字段指示用户到该位置的时间。这样做的目的是在用户已经历的每个小时数的不同行中,仅在“小时”字段中指示前两个数字。

TABLE USERHOUR

   DATE               |    HOUR    |    ID    
 (Bigint)                (String)    (Bigint)   
 01092019                   21            1
 01092019                   22            1
 01092019                   23            1
 01092019                   23            2            
 01092019                   22            3          
 01092019                   23            3            
 02092019                   00            3
 02092019                   01            3
 02092019                   02            3

当前,我的查询是这样,但它不起作用。我正在尝试“全部联合”

insert overwrite table USERHOUR
(select [start_time_date] ,[end_time_date]
    from user
    union all
    select [start_time_date]+1,[end_time_date]
    where [start_time_date]+1<=[end_time_date]
    )
as hour) --generate a range between start_time_date and end_time_date and before cast to Hours,
end_day_id a date,
user as id
from table USER; 

1 个答案:

答案 0 :(得分:2)

为此,我计算了小时差,使用posexplode(space(hours))生成了行,计算了开始时间戳+(爆炸位置)* 3600,并从生成的时间戳中提取了小时和日期。

使用您的示例观看此演示:

with mydata as(--this is your data
select stack(3,
'210241', '231236', 1, '01092019', '01092019',
'234736', '235251', 2, '01092019', '01092019',
'223408', '021345', 3, '01092019', '02092019'
) as (START_TIME_DATE,END_TIME_DATE,USER,START_DAY_ID,END_DAY_ID))

select --extract date, hour from timestamp calculated
       --this can be done in previous step
       --this subquery is to make code cleaner
       date_format(dtm, 'ddMMyyyy') as DATE, 
       date_format(dtm, 'HH')       as HOUR, 
       user                         as ID
from
(
select user, 
       start, h.i, hours, --these columns are for debugging
       from_unixtime(start+h.i*3600) dtm --add hour (in seconds) to the start unix timestamp
                                         --and convert to timestamp
from
(
select user,
       --start timestamp (unix timestamp in seconds) 
       unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH') as start, 
       floor((unix_timestamp(concat(END_DAY_ID, ' ', substr(END_TIME_DATE,1,2)),'ddMMyyyy HH')-
              unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH')
             )/ --diff in seconds
        3600) as hours --diff in hours
  from mydata
)s
lateral view posexplode(split(space(cast(s.hours as int)),' ')) h as i,x --this will generate rows
)s
;

结果:

OK
01092019        21      1
01092019        22      1
01092019        23      1
01092019        23      2
01092019        22      3
01092019        23      3
02092019        00      3
02092019        01      3
02092019        02      3
Time taken: 3.207 seconds, Fetched: 9 row(s)