我有以下格式的表格
'--------------------------------------
ID|col1 |date_ts
--------------------------------------
1 |type1 | 2011-10-01 23:59:59.163-08
2 |type1 | 2011-10-02 21:42:20.152-19
3 |type2 | 2011-10-03 23:21:49.175-21
4 |type3 | 2011-10-03 23:19:39.169-12
5 |type2 | 2011-10-05 23:34:30.129-01'
我正在尝试按日期分组并获取col1中每种类型的计数
以下是我想要实现的输出
' date | type1 | type2 |type3 |
------------------------------------------
2011-10-01 | 1 | 0 | 0 |
2011-10-02 | 1 | 0 | 0 |
2011-10-03 | 0 | 1 | 1 |
2011-10-05 | 0 | 1 | 0 |'
我现在有以下查询。但是会出现运行时错误。
'set hive.cli.print.header=true;
select
sum(if(col1 = 'type1', 1, 0)) as type_1,
sum(if(col1 = 'type2', 1, 0)) as type_2,
sum(if(col1 = 'type3', 1, 0)) as type_3
from table1 WHERE unix_timestamp(date_ts) >= unix_timestamp('2011-10-01 00:00:00.178-01') AND unix_timestamp (date_ts) <= unix_timestamp('2011-10-05 23:59:59.168-08')
GROUP BY col1, TO_DATE(date_ts)
ORDER BY date_ts;'
关于如何做到这一点的任何想法?谢谢
答案 0 :(得分:1)
您需要在投影列中公开date_ts。
选择 to_date(date_ts)date_ts, sum(if(col1 =&#39; type1&#39;,1,0))as type_1, sum(if(col1 =&#39; type2&#39;,1,0))为type_2, sum(if(col1 =&#39; type3&#39;,1,0))为type_3 来自table1 WHERE unix_timestamp(date_ts)&gt; = unix_timestamp(&#39; 2011-10-01 00:00:00.178-01&#39;)和unix_timestamp(date_ts)&lt; = unix_timestamp(&#39; 2011-10- 05 23:59:59.168-08&#39;) GROUP BY col1,TO_DATE(date_ts) ORDER BY date_ts;&#39;
答案 1 :(得分:1)
我删除了where条件以过滤掉日期。我使用子字符串来获取整个列的日期部分。并且仅在日期列
上执行了GROUP BY'select substr(ltrim(date_ts),0,10) date_ts,
sum(if(col1 = 'type1', 1, 0)) as type_1,
sum(if(col1 = 'type2', 1, 0)) as type_2,
sum(if(col1 = 'type3', 1, 0)) as type_3
from table1
GROUP BY substr(ltrim(date_ts),0,10)
ORDER BY date_ts;'
我的输出
' date | type1 | type2 |type3 |
------------------------------------------
2011-10-01 | 1 | 0 | 0 |
2011-10-02 | 1 | 0 | 0 |
2011-10-03 | 0 | 1 | 1 |
2011-10-05 | 0 | 1 | 0 |'