Question

我正在创建一个无向图的表格，如下所示。

+-------------------+------------------------+----------------------+
|     id            |     node_a             |        node_b        |
+-------------------+------------------------+----------------------+
|     1             |         a              |           b          |
+-------------------+------------------------+----------------------+
|     2             |         a              |           c          |
+-------------------+------------------------+----------------------+
|     3             |         a              |           d          |
+-------------------+------------------------+----------------------+
|     4             |         b              |           a          |
+-------------------+------------------------+----------------------+
|     5             |         b              |           c          |
+-------------------+------------------------+----------------------+
...

行id = 1和id = 4是重复的行，应删除其中的一行。删除此表中所有重复行的有效方法是什么？

Answer 1

您可以执行以下操作来生成不同的行：

select e.*
from edges e
where e.node_a < e.node_b
union all
select e.*
from edges e
where e.node_a > e.node_b and
      not exists (select 1
                  from edges e2 
                  where e2.node_a = e.node_b and e2.node_b = e.node_a
                 );

如果实际上有未转置的重复项，请使用union而不是union all。

以上内容保留了表格中的原始边缘。如果不必担心，那么一个简单的方法是：

select distinct least(node_a, node_b) as node_a, greatest(node_a, node_b) as node_b
from edges e;

HiveQL：如何删除基于两列的重复行

1 个答案: