Hive Window功能跨多个日期范围

时间:2016-07-06 14:06:46

标签: sql hadoop hive

我的表格如下所示:

TagName | DateTime          | Value

TagName1|2016-07-06 09:49:34|14
TagName1|2016-07-06 09:50:34|15
TagName1|2016-07-06 09:51:34|18
TagName2|2016-07-03 02:13:34|421
TagName2|2016-07-03 03:13:34|422
TagName3|2016-07-01 03:13:34|14

我想要做的是针对定义的日期范围,每个TagName(例如,总和,加权平均值,最新值,计数等)在此表上进行多次聚合。

这是我到目前为止所做的:

SELECT *
FROM
(
SELECT
t1.TagName,
reflect("java.util.UUID", "randomUUID") as rv_id,
t2.item_id as rs_id,
from_unixtime(unix_timestamp()) as tstamp,
t1.datetime as last_date,
t1.value as last_value,
t1.minimum as minimum,
t1.maximum as maximum,
t1.count as count,
t1.total as total,
t1.average as average,
SUM(t1.weight_value) OVER (PARTITION BY TagName) as weighted_average,
t1.Rank as Rank
FROM
(SELECT
TagName,
value,
datetime,
MIN(value) OVER (PARTITION BY TagName) as minimum,
MAX(value) OVER (PARTITION BY TagName) as maximum,
ROW_NUMBER() OVER (PARTITION BY TagName ORDER BY datetime DESC) as Rank,
SUM(value) OVER (PARTITION BY TagName) as total,
COUNT(value) OVER (PARTITION BY TagName) as count,
AVG(value) OVER (PARTITION BY TagName) as average,
(unix_timestamp(datetime) - LAG(unix_timestamp(datetime),1) OVER (PARTITION BY TagName ORDER BY datetime))/
(SUM(unix_timestamp(datetime) - LAG(unix_timestamp(datetime),1) OVER (PARTITION BY TagName ORDER BY datetime)) OVER (PARTITION BY TagName)) * 
(LAG(value,1) OVER (PARTITION BY TagName ORDER BY datetime)) as weight_value
FROM raw.analog_history_dynamic
WHERE par_date > date_format(date_sub(to_date(current_date), 5),'yyyyMMdd')) t1
LEFT JOIN meta.item_meta t2
ON t1.TagName = t2.name) t3
WHERE t3.Rank =1; 

在这种情况下,我正在查看过去5天

WHERE par_date > date_format(date_sub(to_date(current_date), 5),'yyyyMMdd'))

除了5天之外,我还有10个其他范围我还需要计算其他范围:

-- 1min
WHERE par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 60000;   

-- 5Min
WHERE par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 300000;

-- 10 Min
WHERE par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 600000;

-- 30 Min
WHERE par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 1800000;

-- 1 Month
WHERE par_date > date_format(date_sub(to_date(current_date), 30),'yyyyMMdd');

-- 2 Month
WHERE par_date > date_format(date_sub(to_date(current_date), 60),'yyyyMMdd');

至少我认为我想要将相同分区下的那些组合在一起,以便所有< 1天聚合(按日期分区的表)

关于能够在一个查询中组合所有这些计算而不是使用不同的where条件单独执行每个计算的任何想法或建议。

由于

1 个答案:

答案 0 :(得分:0)

In the select query statement only you could use "case when condition;s" which you have given in where clause eg - 

SELECT *
FROM
(
SELECT
t1.TagName,
reflect("java.util.UUID", "randomUUID") as rv_id,
t2.item_id as rs_id,
from_unixtime(unix_timestamp()) as tstamp,
t1.datetime as last_date,
t1.value as last_value,
t1.flag,
t1.minimum as minimum,
t1.maximum as maximum,
t1.count as count,
t1.total as total,
t1.average as average,
SUM(t1.weight_value) OVER (PARTITION BY TagName) as weighted_average,
t1.Rank as Rank
FROM
(SELECT
TagName,
value,
datetime,
case 
when par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 60000 
then flag_1min
when par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 300000 
then flag_5min 
when .......and so on 
end as flag,
MIN(value) OVER (PARTITION BY TagName) as minimum,
MAX(value) OVER (PARTITION BY TagName) as maximum,
ROW_NUMBER() OVER (PARTITION BY TagName ORDER BY datetime DESC) as Rank,
SUM(value) OVER (PARTITION BY TagName) as total,
COUNT(value) OVER (PARTITION BY TagName) as count,
AVG(value) OVER (PARTITION BY TagName) as average,
(unix_timestamp(datetime) - LAG(unix_timestamp(datetime),1) OVER (PARTITION BY TagName ORDER BY datetime))/
(SUM(unix_timestamp(datetime) - LAG(unix_timestamp(datetime),1) OVER (PARTITION BY TagName ORDER BY datetime)) OVER (PARTITION BY TagName)) * 
(LAG(value,1) OVER (PARTITION BY TagName ORDER BY datetime)) as weight_value
FROM raw.analog_history_dynamic
WHERE par_date > date_format(date_sub(to_date(current_date), 5),'yyyyMMdd')) t1
LEFT JOIN meta.item_meta t2
ON t1.TagName = t2.name
group by TagName,
value,
datetime,
case 
when par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 60000 
then flag_1min
when par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 300000 
then flag_5min 
when .......and so on 
end as flag,) t3
WHERE t3.Rank =1; 

NOTE: in the above code of yours, you have forgotten to use GROUP BY function since you had aggregate functions