如何在没有__hash__的情况下删除对象列表中的重复项

时间:2017-10-10 10:23:08

标签: python hash bioinformatics equality ete3

我有一个自定义对象列表,我想从中删除重复项。通常,您可以通过为对象定义__eq____hash__,然后获取对象列表的set来执行此操作。我已经定义了__eq__,但我无法找到一种实现__hash__的好方法,以便它为相同的对象返回相同的值。

更具体地说,我有一个来自ete3 toolkitTree类的类。如果Robinson-Foulds距离为零,我已将两个对象定义为相等。

from ete3 import Tree

class MyTree(Tree):

    def __init__(self, *args, **kwargs):
        super(MyTree, self).__init__(*args, **kwargs)

    def __eq__(self, other):
        rf = self.robinson_foulds(other, unrooted_trees=True)
        return not bool(rf[0])

newicks = ['((D, C), (A, B),(E));',
           '((D, B), (A, C),(E));',
           '((D, A), (B, C),(E));',
           '((C, D), (A, B),(E));',
           '((C, B), (A, D),(E));',
           '((C, A), (B, D),(E));',
           '((B, D), (A, C),(E));',
           '((B, C), (A, D),(E));',
           '((B, A), (C, D),(E));',
           '((A, D), (B, C),(E));',
           '((A, C), (B, D),(E));',
           '((A, B), (C, D),(E));']

trees = [MyTree(newick) for newick in newicks]

print len(trees)       # 12
print len(set(trees))  # also 12, not what I want!

print len(trees)print len(set(trees))都返回12,但这不是我想要的,因为有几个对象彼此相等:

from itertools import product
for t1, t2 in product(newicks, repeat=2):
    if t1 != t2:
        mt1 = MyTree(t1)
        mt2 = MyTree(t2)
        if mt1 == mt2:
            print t1, '==', t2

返回:

((D, C), (A, B),(E)); == ((C, D), (A, B),(E));
((D, C), (A, B),(E)); == ((B, A), (C, D),(E));
((D, C), (A, B),(E)); == ((A, B), (C, D),(E));
((D, B), (A, C),(E)); == ((C, A), (B, D),(E));
((D, B), (A, C),(E)); == ((B, D), (A, C),(E));
((D, B), (A, C),(E)); == ((A, C), (B, D),(E));
((D, A), (B, C),(E)); == ((C, B), (A, D),(E));
((D, A), (B, C),(E)); == ((B, C), (A, D),(E));
((D, A), (B, C),(E)); == ((A, D), (B, C),(E));
((C, D), (A, B),(E)); == ((D, C), (A, B),(E));
((C, D), (A, B),(E)); == ((B, A), (C, D),(E));
((C, D), (A, B),(E)); == ((A, B), (C, D),(E));
((C, B), (A, D),(E)); == ((D, A), (B, C),(E));
((C, B), (A, D),(E)); == ((B, C), (A, D),(E));
((C, B), (A, D),(E)); == ((A, D), (B, C),(E));
((C, A), (B, D),(E)); == ((D, B), (A, C),(E));
((C, A), (B, D),(E)); == ((B, D), (A, C),(E));
((C, A), (B, D),(E)); == ((A, C), (B, D),(E));
((B, D), (A, C),(E)); == ((D, B), (A, C),(E));
((B, D), (A, C),(E)); == ((C, A), (B, D),(E));
((B, D), (A, C),(E)); == ((A, C), (B, D),(E));
((B, C), (A, D),(E)); == ((D, A), (B, C),(E));
((B, C), (A, D),(E)); == ((C, B), (A, D),(E));
((B, C), (A, D),(E)); == ((A, D), (B, C),(E));
((B, A), (C, D),(E)); == ((D, C), (A, B),(E));
((B, A), (C, D),(E)); == ((C, D), (A, B),(E));
((B, A), (C, D),(E)); == ((A, B), (C, D),(E));
((A, D), (B, C),(E)); == ((D, A), (B, C),(E));
((A, D), (B, C),(E)); == ((C, B), (A, D),(E));
((A, D), (B, C),(E)); == ((B, C), (A, D),(E));
((A, C), (B, D),(E)); == ((D, B), (A, C),(E));
((A, C), (B, D),(E)); == ((C, A), (B, D),(E));
((A, C), (B, D),(E)); == ((B, D), (A, C),(E));
((A, B), (C, D),(E)); == ((D, C), (A, B),(E));
((A, B), (C, D),(E)); == ((C, D), (A, B),(E));
((A, B), (C, D),(E)); == ((B, A), (C, D),(E));

所以我的问题是:

  • 如果__hash__有效,set(trees)对我的案例有什么好的__hash__实施?
  • 或者如何在未定义{{1}}的情况下删除与我的列表相同的对象?

0 个答案:

没有答案