Question

假设我在hive中有一个表格如下：

|Id|Data |Data2 |Groupkey|
|1 |One  |      |Group1  |
|2 |Two  |Stuff |Group1  |
|3 |Shoes|Some  |Group2  |
|4 |four |Stuff |Group2  |
|5 |Three|Notme |Group3  |

对于Data2中包含“Stuff”的每个组，我希望从Groupkey以外的行获取Data和Stuff的行，并且{ {1}}来自'Stuff'行。

因此结果数据集看起来像

Data2

GROUP BY但是这无法表明我需要在群组中包含数据，但这不是我想要分组的内容吗？

而且我不确定如何只选择包含某行数据的组。

Answer 1

SELECT DISTINCT Groupkey, t1.Data, t2.Data as Data2 
FROM t t1
INNER JOIN t t2
ON t1.Groupkey = t2.Groupkey
AND t1.Data2 <> t2.Data2
WHERE t2.Data2 = 'Stuff'

Answer 2

select      Groupkey                                            as `Group`
           ,min (case when Data2 <> 'Stuff' then Data end)      as Data
           ,min (case when Data2 =  'Stuff' then Data end)      as Data2

from        MyTable

group by    Groupkey

having      count (case when Data2 = 'Stuff' then 1 end) > 0
;

+--------+-------+-------+
| group  | data  | data2 |
+--------+-------+-------+
| Group1 | One   | Two   |
| Group2 | Shoes | four  |
+--------+-------+-------+

Hive - 从包含至少一行的组中选择数据

2 个答案: