Hive多个子查询和分组依据

时间:2013-03-04 12:09:10

标签: hive amazon-dynamodb emr hiveql

我正在将统计数据从MySQL切换到Amazon DynamoDB和Elastic MapReduce。

我有查询以下与MySQL一起工作,我在hive上有相同的表,需要与MySQL相同的结果(last_week,last_month和last_year的产品视图)。

SELECT product_id,
SELECT COUNT(product_id) from dev_product_views_hive as P2 where P2.product_id=P.product_id and created >= DATE_SUB(NOW(), INTERVAL 1 WEEK) as weekly,
SELECT count(product_id) from dev_product_views_hive as P3 where P3.product_id=P.product_id and created >= DATE_SUB(NOW(), INTERVAL 1 MONTH) as monthly,
SELECT count(product_id) from dev_product_views_hive as P4 where P4.product_id=P.product_id and created >= DATE_SUB(NOW(), INTERVAL 1 YEAR) as yearly
from dev_product_views_hive as P group by product_id;

我想知道如何使用hive获取上个月的结果:

SELECT product_id, COUNT(product_id) as views from dev_product_views_hive WHERE created >= UNIX_TIMESTAMP(CONCAT(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()), 31)," ","00:00:00")) GROUP BY product_id;

但我需要分组结果,就像我使用MySql:

product_id views_last_week views_last_month views_last_year
2                 564             2460         29967
4                 980             3986         54982  

是否可以使用配置单元执行此操作?

提前谢谢你,

阿梅尔

1 个答案:

答案 0 :(得分:1)

您可以使用case whensum()count()

来执行此操作

例如

select product_id, 
sum(case when created >= concat(date_sub(to_date(from_unixtime(unix_timestamp())), 7)," 00:00:00") then 1 else 0 end)  as weekly,
sum(case when created >= concat(date_sub(to_date(from_unixtime(unix_timestamp())), 31)," 00:00:00") then 1 else 0 end) as monthly,
sum(case when created >= concat(date_sub(to_date(from_unixtime(unix_timestamp())), 365)," 00:00:00") then 1 else 0 end) as yearly
from dev_product_views_hive 
group by product_id;

concat(date_sub(to_date(from_unixtime(unix_timestamp())), days)," 00:00:00")将返回当前时间的格式化天数字符串。

case when会在您预期的日期>=重新启动1

您也可以使用hive内置函数count()来执行此操作,该函数仅计算那些返回非NULL的行

count(case when created >= concat(date_sub(to_date(from_unixtime(unix_timestamp())), 7)," 00:00:00") then 1 end)  as weekly