蜂巢:查询表以符合条件

时间:2018-09-26 19:35:23

标签: hive hiveql

在Hive中有一个包含以下数据的表。我正在尝试玩的游戏:

A   B   C   D 
==============
76  5   0.6 107777
78  5   0.5 107777
79  5   0.5 107777
79  5   0.5 107777
80  5   0.5 107777
210 5   0.5 107777
211 5   0.5 107777
213 5   0.5 107777
316 5   0.5 107777
316 5   0.5 107777
76  7   0.5 102997
78  7   0.5 102997
79  8   0.5 102997
79  8   0.5 102997
80  9   0.5 108997
80  9   0.5 108997
80  9   0.5 108997


Need to count the 'B and D' when B>4 and C is not same for B and D.

预期的O / P:

在此处查找以下值:对于“ A”和“ B”中的相同值,“ C”中的值不存在。同时显示表中存在的重复值(大于1)。

Value in 'C' is not present for the same value in 'A' and 'B':

这意味着

A B C
=====
76 5 0.6 => OK
78 5 0.5 => OK
79 5 0.5 => OK 
79 5 0.5 => NOT OK (As C=0.5 shouldn't have repetitive value for same A and B value)
80 5 0.5 => OK.....




A   B   C   D 
==============
79  5   0.5 107777
316 5   0.5 107777
79  8   0.5 102997
80  9   0.5 108997
80  9   0.5 108997

Count: 5

能够编写第一部分的查询,但其余部分没有任何突破:

SELECT A,B,C,D FROM DB.TABLE1 WHERE B >1; 

但没有得到最后一部分的写法:

count the 'B and D' when B>4 and C is not same for B and D.

任何有关此的建议都会很有帮助。

Update_1:

尝试了以下内容:

 SELECT A,B,C,D FROM (SELECT * FROM TABLE1 WHERE B >4) t1 GROUP BY B,D HAVING countnum>1 LIMIT 20;

但出现错误:

FAILED: SemanticException [Error 10025]: Line 1:197 Expression not in GROUP BY key '1'
hive> 

1 个答案:

答案 0 :(得分:0)

Need to count the 'B and D' when B>4 and C is not same for B and D.

输入:table1

A   B   C   D 
==============
76  5   0.6 107777
78  5   0.5 107777
79  5   0.5 107777
79  5   0.5 107777
80  5   0.5 107777
210 5   0.5 107777
211 5   0.5 107777
213 5   0.5 107777
316 5   0.5 107777
316 5   0.5 107777
76  7   0.5 102997
78  7   0.5 102997
79  8   0.5 102997
79  8   0.5 102997
80  9   0.5 108997
80  9   0.5 108997
80  9   0.5 108997

查询:

select count(*)
from (
  select *, row_number() over (partition by B, C, D) as rn
  from table1
  where B>4
) as t1
where rn=1;

输出:5

说明:row_number()根据BCD的值给出行号。对于这三个变量具有相同值的行,行号将保持递增。

A   B   C   D       rn
======================
76  5   0.6 107777  1
78  5   0.5 107777  1
79  5   0.5 107777  2
79  5   0.5 107777  3
80  5   0.5 107777  4
210 5   0.5 107777  5
211 5   0.5 107777  6
213 5   0.5 107777  7
316 5   0.5 107777  8
316 5   0.5 107777  9
76  7   0.5 102997  1
78  7   0.5 102997  2
79  8   0.5 102997  1
79  8   0.5 102997  2
80  9   0.5 108997  1
80  9   0.5 108997  2
80  9   0.5 108997  3