我有一个看起来像这样的表: coumn 1 = timestamp:string,column 2 = numOfentites:int 请注意我正在使用hiveql
Fri, 10 Aug 2001 274
Fri, 10 Dec 1999 39
Fri, 10 Mar 2000 107
Fri, 10 May 2002 26
Fri, 10 Nov 2000 351
Fri, 10 Sep 1999 22
Fri, 11 Aug 2000 189
Fri, 11 Dec 1998 1
Fri, 11 Feb 2000 84
Fri, 11 Jan 2002 580
Fri, 11 Jun 1999 12
Fri, 11 May 2001 571
Fri, 12 Apr 2002 41
现在,我从这张表中检索了每年的频率,发现有些年份XXXX的实体数量最多。
我现在的目标是深入一级并提取XXXX每月的频率。
我厌倦了在子字符串上使用group by子句指示月份,但它不起作用。
你能告诉我如何继续的方向吗?只需要一个提示而不是答案:P试图在这里学习hiveql
修改 这是我用来每年提取实体频率的查询。 请注意,时间戳是输入的第一列。
select dates , count(dates) as numEmails
from (select split(timestamp," ")[3] as dates , count(timestamp)
from dataset
group by timestamp
) mailfreq
group by dates
order by numEmails desc;
答案 0 :(得分:0)
我知道hivesql有一些奇怪的限制,但这不会起作用吗?
select split(timestamp," ")[3] as yr, split(timestamp," ")[2] as mon, count(timestamp)
from dataset
group by split(timestamp," ")[3], split(timestamp," ")[2];