在Hive中有一个包含以下数据的表。我正在尝试玩的游戏:
A B C D
==============
76 5 0.6 107777
78 5 0.5 107777
79 5 0.5 107777
79 5 0.5 107777
80 5 0.5 107777
210 5 0.5 107777
211 5 0.5 107777
213 5 0.5 107777
316 5 0.5 107777
316 5 0.5 107777
76 7 0.5 102997
78 7 0.5 102997
79 8 0.5 102997
79 8 0.5 102997
80 9 0.5 108997
80 9 0.5 108997
80 9 0.5 108997
Need to count the 'B and D' when B>4 and C is not same for B and D.
预期的O / P:
在此处查找以下值:对于“ A”和“ B”中的相同值,“ C”中的值不存在。同时显示表中存在的重复值(大于1)。
Value in 'C' is not present for the same value in 'A' and 'B':
这意味着
A B C
=====
76 5 0.6 => OK
78 5 0.5 => OK
79 5 0.5 => OK
79 5 0.5 => NOT OK (As C=0.5 shouldn't have repetitive value for same A and B value)
80 5 0.5 => OK.....
A B C D
==============
79 5 0.5 107777
316 5 0.5 107777
79 8 0.5 102997
80 9 0.5 108997
80 9 0.5 108997
Count: 5
能够编写第一部分的查询,但其余部分没有任何突破:
SELECT A,B,C,D FROM DB.TABLE1 WHERE B >1;
但没有得到最后一部分的写法:
count the 'B and D' when B>4 and C is not same for B and D.
任何有关此的建议都会很有帮助。
Update_1:
尝试了以下内容:
SELECT A,B,C,D FROM (SELECT * FROM TABLE1 WHERE B >4) t1 GROUP BY B,D HAVING countnum>1 LIMIT 20;
但出现错误:
FAILED: SemanticException [Error 10025]: Line 1:197 Expression not in GROUP BY key '1'
hive>
答案 0 :(得分:0)
Need to count the 'B and D' when B>4 and C is not same for B and D.
输入:table1
A B C D
==============
76 5 0.6 107777
78 5 0.5 107777
79 5 0.5 107777
79 5 0.5 107777
80 5 0.5 107777
210 5 0.5 107777
211 5 0.5 107777
213 5 0.5 107777
316 5 0.5 107777
316 5 0.5 107777
76 7 0.5 102997
78 7 0.5 102997
79 8 0.5 102997
79 8 0.5 102997
80 9 0.5 108997
80 9 0.5 108997
80 9 0.5 108997
查询:
select count(*)
from (
select *, row_number() over (partition by B, C, D) as rn
from table1
where B>4
) as t1
where rn=1;
输出:5
说明:row_number()
根据B
,C
和D
的值给出行号。对于这三个变量具有相同值的行,行号将保持递增。
A B C D rn
======================
76 5 0.6 107777 1
78 5 0.5 107777 1
79 5 0.5 107777 2
79 5 0.5 107777 3
80 5 0.5 107777 4
210 5 0.5 107777 5
211 5 0.5 107777 6
213 5 0.5 107777 7
316 5 0.5 107777 8
316 5 0.5 107777 9
76 7 0.5 102997 1
78 7 0.5 102997 2
79 8 0.5 102997 1
79 8 0.5 102997 2
80 9 0.5 108997 1
80 9 0.5 108997 2
80 9 0.5 108997 3