如何过滤where子句,在select语句中的列上包含count / distinct / case / when

时间:2017-07-13 13:53:50

标签: sql hadoop where-clause

在hadoop上使用SQL。

我有一个ID列表,我试图计算两种不同的客户评论数据点的总数。对于guest_review_1我已经归还了总数。对于guest_review_2,我将总计数分为5个范围。

我正在努力的是在guest_review_1的where子句中设置过滤器,其中我不包括总计数小于5的属性。

任何解决方法的想法?嵌套的Select语句可能?

包含以下查询的示例:

Select 
id,
count(distinct guest_review_1) as "Guest_Reviews",
count(distinct(case when guest_review_2 < 1 then guest_review_1 end)) as Group1,
Count(distinct(case when guest_review_2 >=2 AND guest_review_2 <3 then guest_review_1 end)) as Group2,
From  table_name
Where
guest_review_2 IS NOT NULL
AND guest_review_1 >=5
AND date BETWEEN '2017-01-01' AND '2017-01-31'
Group By id

1 个答案:

答案 0 :(得分:0)

我不完全确定示例查询中group_1group_2聚合的含义。但是,您的问题的本质似乎是如何根据聚合函数(count)的结果过滤结果集,而不是过滤单个输入行的值。 Apache Hive通过使用SQL HAVING子句来支持这一点。

在以下示例中,输入关系包含6行,id设置为1,4行id设置为2。该查询包含条款HAVING guest_reviews >= 5。由于HAVING子句,结果集仅包含id1的行。没有输出行id设置为2

WITH table_name AS (
    SELECT 1 AS id, 1 AS guest_review_1, 1 AS guest_review_2 UNION ALL
    SELECT 1 AS id, 2 AS guest_review_1, 2 AS guest_review_2 UNION ALL
    SELECT 1 AS id, 3 AS guest_review_1, 3 AS guest_review_2 UNION ALL
    SELECT 1 AS id, 4 AS guest_review_1, 4 AS guest_review_2 UNION ALL
    SELECT 1 AS id, 5 AS guest_review_1, 5 AS guest_review_2 UNION ALL
    SELECT 1 AS id, 6 AS guest_review_1, 6 AS guest_review_2 UNION ALL
    SELECT 2 AS id, 1 AS guest_review_1, 1 AS guest_review_2 UNION ALL
    SELECT 2 AS id, 2 AS guest_review_1, 2 AS guest_review_2 UNION ALL
    SELECT 2 AS id, 3 AS guest_review_1, 3 AS guest_review_2 UNION ALL
    SELECT 2 AS id, 4 AS guest_review_1, 4 AS guest_review_2
)
SELECT
    id,
    count(DISTINCT guest_review_1) AS guest_reviews,
    count(DISTINCT(CASE WHEN guest_review_2 < 1 THEN guest_review_1 END)) AS group_1,
    count(DISTINCT(CASE WHEN guest_review_2 >= 2 AND guest_review_2 < 3 THEN guest_review_1 END)) as group_2
FROM table_name
WHERE guest_review_2 IS NOT NULL
GROUP BY id
HAVING guest_reviews >= 5
;