我有一个记录大小的表用于各种服务器,并且还扫描了日期服务器。我需要在每个月获取每个服务器的最新条目。 我怎么能在impala sql中做到这一点..非常感谢。
Data Server Size
11/4/2017 ABC 200
11/18/2017 ABC 700
11/25/2017 ABC 1009
12/4/2017 ABC 200
12/18/2017 ABC 700
12/20/2017 ABC 1100
1/4/2018 ABC 200
1/18/2018 ABC 700
1/20/2018 ABC 1009
11/4/2017 CAD 200
11/18/2017 CAD 700
11/25/2017 CAD 1009
12/4/2017 CAD 200
12/18/2017 CAD 700
12/20/2017 CAD 1100
预期结果
Data Server Size
11/25/2017 ABC 1009
12/20/2017 ABC 1100
1/20/2018 ABC 1009
11/25/2017 CAD 1009
12/20/2017 CAD 1100
答案 0 :(得分:0)
Impala支持窗口功能,因此您可以执行以下操作:
select t.*
from (select t.*,
row_number() over (partition by server, trunc(data, 'MONTH')
order by data desc
) as seqnum
from t
) t
where seqnum = 1;
编辑:
上面给出了每个服务器的最新值。对于每月一行,请从server
partition by
select t.*
from (select t.*,
row_number() over (partition by trunc(data, 'MONTH')
order by data desc
) as seqnum
from t
) t
where seqnum = 1;
答案 1 :(得分:0)
SELECT t.*
FROM t
INNER JOIN
(SELECT MONTH(data) AS month, MAX(DAY(data)) AS day, server
FROM t
GROUP BY MONTH(data), server) sub
ON (MONTH(t.data) = sub.month AND DAY(t.data) = sub.day AND t.server = sub.server)
在子查询中选择每个服务器每月的最大日期。然后将子查询的结果加入主表。这将消除每个服务器每月不是最新的行。