在HiveQL中,如何根据id
和group1
计算具有相同分组的group2
的数量?
+-----+--------+--------+
| id1 | group1 | group2 |
+-----+--------+--------+
| 1 | Z | a |
| 2 | Z | a |
| 3 | Z | b |
| 4 | Z | c |
| 5 | Y | d |
+-----+--------+--------+
返回的结果应为3,因为我们应该取{{Z,a)(Z,b)(Z,c)}的最大值,即2,最大值为{(Y,d)},这是1.
我尝试使用子查询在HiveQL中对此进行编码但没有成功。任何提示都将不胜感激。
答案 0 :(得分:0)
最后使用SQL Fiddle找出它。尝试用以下方式制作玩具示例:
CREATE TABLE test
(`id` int, `group1` varchar(7), `group2` varchar(7));
INSERT INTO test
(`id`, `group1`, `group2`)
VALUES
(1, 'Z', 'a'),
(2, 'Z', 'a'),
(3, 'Z', 'b'),
(4, 'Z', 'c'),
(5, 'Y', 'd');
运行以下查询将返回3.
SELECT SUM(d.max_val)
FROM(SELECT MAX(c.count) as max_val, c.group1, c.group2
FROM (SELECT COUNT(*) AS count, group1, group2
FROM test
GROUP BY group1, group2) c
GROUP BY c.group1, c.group2) d