我需要你的帮忙。我在Hive中使用hql查询来加载表。有人有加载表的想法吗?
表用户
START_TIME_DATE | END_TIME_DATE | USER | START_DAY_ID | END_DAY_ID
(String) (String) (Bigint) (Int) (Int)
210241 231236 1 01092019 01092019
234736 235251 2 01092019 01092019
223408 021345 3 01092019 02092019
START_TIME_DATE和END__TIME_DATE字段指示用户到该位置的时间。这样做的目的是在用户已经历的每个小时数的不同行中,仅在“小时”字段中指示前两个数字。
TABLE USERHOUR
DATE | HOUR | ID
(Bigint) (String) (Bigint)
01092019 21 1
01092019 22 1
01092019 23 1
01092019 23 2
01092019 22 3
01092019 23 3
02092019 00 3
02092019 01 3
02092019 02 3
当前,我的查询是这样,但它不起作用。我正在尝试“全部联合”
insert overwrite table USERHOUR
(select [start_time_date] ,[end_time_date]
from user
union all
select [start_time_date]+1,[end_time_date]
where [start_time_date]+1<=[end_time_date]
)
as hour) --generate a range between start_time_date and end_time_date and before cast to Hours,
end_day_id a date,
user as id
from table USER;
答案 0 :(得分:2)
为此,我计算了小时差,使用posexplode(space(hours))生成了行,计算了开始时间戳+(爆炸位置)* 3600,并从生成的时间戳中提取了小时和日期。
使用您的示例观看此演示:
with mydata as(--this is your data
select stack(3,
'210241', '231236', 1, '01092019', '01092019',
'234736', '235251', 2, '01092019', '01092019',
'223408', '021345', 3, '01092019', '02092019'
) as (START_TIME_DATE,END_TIME_DATE,USER,START_DAY_ID,END_DAY_ID))
select --extract date, hour from timestamp calculated
--this can be done in previous step
--this subquery is to make code cleaner
date_format(dtm, 'ddMMyyyy') as DATE,
date_format(dtm, 'HH') as HOUR,
user as ID
from
(
select user,
start, h.i, hours, --these columns are for debugging
from_unixtime(start+h.i*3600) dtm --add hour (in seconds) to the start unix timestamp
--and convert to timestamp
from
(
select user,
--start timestamp (unix timestamp in seconds)
unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH') as start,
floor((unix_timestamp(concat(END_DAY_ID, ' ', substr(END_TIME_DATE,1,2)),'ddMMyyyy HH')-
unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH')
)/ --diff in seconds
3600) as hours --diff in hours
from mydata
)s
lateral view posexplode(split(space(cast(s.hours as int)),' ')) h as i,x --this will generate rows
)s
;
结果:
OK
01092019 21 1
01092019 22 1
01092019 23 1
01092019 23 2
01092019 22 3
01092019 23 3
02092019 00 3
02092019 01 3
02092019 02 3
Time taken: 3.207 seconds, Fetched: 9 row(s)