在Hive中获取数据交集

时间:2012-11-09 10:46:33

标签: hive

我在hive中有以下数据:

userid cityid
1      15
2      15
1      7
3      15
2      8
3      9
3      7

我想只保留具有cityid 15和cityid 7的用户ID(在我的示例中,它将是userid的1和3)。 我试过了:

select userid from table where cityid = 15 and userid in (select userid from table where cityid = 7);

但是蜂巢不起作用。 有人可以帮忙吗?

谢谢!

3 个答案:

答案 0 :(得分:2)

好的,我发现了怎么做:

select a.userid from (select userid from table where cityid = 15) a join (select userid from table where cityid = 7) b on a.userid = b.userid;

答案 1 :(得分:1)

SELECT DISTINCT userid FROM table_name WHERE cityid == 15 OR cityid == 7;

答案 2 :(得分:1)

尝试避免自我加入

SELECT  userid
FROM
 ( SELECT userid, collect_set( cityid) as cities
   FROM table 
   GROUP BY userid 
 )
WHERE array_contains( cities, 7 )
AND array_contains( cities, 15 );