假设我在hive中有一个表格如下:
|Id|Data |Data2 |Groupkey|
|1 |One | |Group1 |
|2 |Two |Stuff |Group1 |
|3 |Shoes|Some |Group2 |
|4 |four |Stuff |Group2 |
|5 |Three|Notme |Group3 |
对于Data2
中包含“Stuff”的每个组,我希望从Groupkey
以外的行获取Data
和Stuff
的行,并且{ {1}}来自'Stuff'行。
因此结果数据集看起来像
Data2
我希望得到|Group |Data |Data2|
|Group1|One |Two |
|Group2|Shoes|four |
的东西,我开始使用
GROUP BY
但是这无法表明我需要在群组中包含数据,但这不是我想要分组的内容吗?
而且我不确定如何只选择包含某行数据的组。
答案 0 :(得分:0)
SELECT DISTINCT Groupkey, t1.Data, t2.Data as Data2
FROM t t1
INNER JOIN t t2
ON t1.Groupkey = t2.Groupkey
AND t1.Data2 <> t2.Data2
WHERE t2.Data2 = 'Stuff'
答案 1 :(得分:0)
select Groupkey as `Group`
,min (case when Data2 <> 'Stuff' then Data end) as Data
,min (case when Data2 = 'Stuff' then Data end) as Data2
from MyTable
group by Groupkey
having count (case when Data2 = 'Stuff' then 1 end) > 0
;
+--------+-------+-------+
| group | data | data2 |
+--------+-------+-------+
| Group1 | One | Two |
| Group2 | Shoes | four |
+--------+-------+-------+