按字段中的子字符串分组

时间:2014-02-23 18:09:35

标签: sql hiveql

我有一个看起来像这样的表: coumn 1 = timestamp:string,column 2 = numOfentites:int 请注意我正在使用hiveql

Fri, 10 Aug 2001    274
Fri, 10 Dec 1999    39
Fri, 10 Mar 2000    107
Fri, 10 May 2002    26
Fri, 10 Nov 2000    351
Fri, 10 Sep 1999    22
Fri, 11 Aug 2000    189
Fri, 11 Dec 1998    1
Fri, 11 Feb 2000    84
Fri, 11 Jan 2002    580
Fri, 11 Jun 1999    12
Fri, 11 May 2001    571
Fri, 12 Apr 2002    41

现在,我从这张表中检索了每年的频率,发现有些年份XXXX的实体数量最多。

我现在的目标是深入一级并提取XXXX每月的频率。

我厌倦了在子字符串上使用group by子句指示月份,但它不起作用。

你能告诉我如何继续的方向吗?

只需要一个提示而不是答案:P试图在这里学习hiveql

修改 这是我用来每年提取实体频率的查询。 请注意,时间戳是输入的第一列。

select  dates , count(dates) as numEmails
from (select split(timestamp," ")[3] as dates , count(timestamp)
      from dataset
      group by timestamp
     ) mailfreq
group by dates
order by numEmails desc;

1 个答案:

答案 0 :(得分:0)

我知道hivesql有一些奇怪的限制,但这不会起作用吗?

select split(timestamp," ")[3] as yr, split(timestamp," ")[2] as mon, count(timestamp)
from dataset
group by split(timestamp," ")[3], split(timestamp," ")[2];