Question

工作（身份，职级）

数据：

work
------------------
1 | A
1 | B
1 | C
1 | D
2 | A
2 | C
2 | B
3 | C

我需要找到所有与其计数具有共同等级的ID，并且只有在等级数大于2时才会显示，并按降序打印。我为此编写了一个mysql查询，但我是SparkSQL和HIVEQL的新手。所以请帮我怎么做。例如，使用结果集上面的数据应该是：

mysql查询是：

select a.id,b.id
from work as a, work as b
where a.id>b.id
group by a.id,b.id having group_concat(distinct a.rank order by a.rank)=group_concat(distinct b.rank order by b.rank)

---------------------
id1 | id2 | Count
---------------------
 A  | B   |  3
 B  | C   |  3

Answer 1

我不认为Hive支持ViewBox。我认为这也是一样的事情：

group_concat()

这是获得具有相同排名的id的更自然的方式。事实上，它几乎在任何数据库中都应该比MySQL版本更有效。 select a.id, b.id, a.cnt from (select a.*, count(*) over (partition by a.id) as cnt from work a ) a join (select b.*, count(*) over (partition by b.id) as cnt from work b ) b on a.rank = b.rank and a.cnt = b.cnt where a.id < b.id -- I *think* this is allowed in Hive; it not, a subquery or expression in the `having` clause will do the same thing group by a.id, b.id, a.cnt having count(*) = a.cnt;会生成大量数据。

MySQL查询到Hiveql

1 个答案: