我想压缩表中的数据,以便在对序列方面没有冗余,例如a,b与b,a相同。
具体来说,我想从表中获取:redundant_relations
+------+------+------+
| p1 | p2 | score|
+------+------+------+
| a | b | 0.8 |
| a | c | 0.67 |
| b | a | 0.8 |
| c | a | 0.67 |
| a | d | 0.89 |
| a | e | 0.47 |
| d | a | 0.89 |
| e | a | 0.47 |
+------+------+------+
要
+------+------+------+
| p1 | p2 | score |
+------+------+------+
| a | b | 0.8 |
| a | c | 0.67 |
| a | d | 0.89 |
| a | e | 0.47 |
+------+------+------+
这里我只想选择第一个关系,并抛弃反向关系,例如,如果A和B是得分为0.8的朋友,我想为他们的关系保留一行,如[A,B,0.8]而不是两个行,即[A,B,0.8]和[B,A,0.8],我已经有一个表存在这些关系,我想删除后面的那些。
提前致谢。
答案 0 :(得分:1)
如果你知道你有所有配对,那么就这样做:
select rr.*
from redundant_relations rr
where rr.p1 < rr.p2;
如果关系不是全部配对或者得分不相同,则会变得更复杂。在这种情况下,我会建议(p1, p2, score)
和
select rr.*
from redundant_relations rr
where rr.p1 < rr.p2
union all
select rr.*
from redundant_relations rr
where rr.p1 > rr.p2 and
not exists (select 1
from redundant_relations rr2
where rr2.p1 = rr.p2 and rr2.p2 = rr.p1 and rr2.score = rr.score
);