Hive - 从包含至少一行的组中选择数据

时间:2017-07-06 09:33:47

标签: sql hive hiveql

假设我在hive中有一个表格如下:

|Id|Data |Data2 |Groupkey|
|1 |One  |      |Group1  |
|2 |Two  |Stuff |Group1  |
|3 |Shoes|Some  |Group2  |
|4 |four |Stuff |Group2  |
|5 |Three|Notme |Group3  |

对于Data2中包含“Stuff”的每个组,我希望从Groupkey以外的行获取DataStuff的行,并且{ {1}}来自'Stuff'行。

因此结果数据集看起来像

Data2

我希望得到|Group |Data |Data2| |Group1|One |Two | |Group2|Shoes|four | 的东西,我开始使用

GROUP BY但是这无法表明我需要在群组中包含数据,但这不是我想要分组的内容吗?

而且我不确定如何只选择包含某行数据的组。

2 个答案:

答案 0 :(得分:0)

SELECT DISTINCT Groupkey, t1.Data, t2.Data as Data2 
FROM t t1
INNER JOIN t t2
ON t1.Groupkey = t2.Groupkey
AND t1.Data2 <> t2.Data2
WHERE t2.Data2 = 'Stuff'

答案 1 :(得分:0)

select      Groupkey                                            as `Group`
           ,min (case when Data2 <> 'Stuff' then Data end)      as Data
           ,min (case when Data2 =  'Stuff' then Data end)      as Data2

from        MyTable

group by    Groupkey

having      count (case when Data2 = 'Stuff' then 1 end) > 0
;
+--------+-------+-------+
| group  | data  | data2 |
+--------+-------+-------+
| Group1 | One   | Two   |
| Group2 | Shoes | four  |
+--------+-------+-------+