Question

我有：

tuple1 = token1, token2
tuple2 = token2, token1
for tuple in [tuple1, tuple2]:
    if tuple in dict:
        dict[tuple] += 1
    else:
        dict[tuple] = 1

然而，元组1和元组2都得到相同的计数。什么是散列一组2件事的方法，这些事情对于秩序很重要？

Answer 1

散列时会考虑订单：

>>> hash((1,2))
1299869600
>>> hash((2,1))
1499606158

这假定对象本身具有唯一的哈希值。即使它们不这样做，在字典中使用它时仍然可以正常（只要对象本身与它们的__eq__方法定义的不相等）：

>>> t1 = 'a',hash('a') 
>>> [hash(x) for x in t1]  #both elements in the tuple have same hash value since `int` hash to themselves in cpython
[-468864544, -468864544]
>>> t2 = hash('a'),'a'
>>> hash(t1)
1486610051
>>> hash(t2)
1486610051
>>> d = {t1:1,t2:2}  #This is OK.  dict's don't fail when there is a hash collision
>>> d
{('a', -468864544): 1, (-468864544, 'a'): 2}
>>> d[t1]+=7
>>> d[t1]
8
>>> d[t1]+=7
>>> d[t1]
15
>>> d[t2]   #didn't touch d[t2] as expected.
2

请注意，由于哈希冲突，此dict的效率可能低于没有哈希冲突的另一个dict：）

Answer 2

他们获得相同计数的原因是您的代码同时显式增加token1,token2和token2,token1计数。如果不这样做，计数将不会保持同步：

In [16]: import collections

In [17]: d = collections.defaultdict(int)

In [18]: d[1,2] += 1

In [19]: d[1,2]
Out[19]: 1

In [20]: d[2,1]
Out[20]: 0

Answer 3

好像你已经发布了一个循环体的一个实例。我建议你使用collections.Counter来做你想做的事情，这正是你想要的，但是在一行中：

counter = (collections.Counter(myListOfTuples) + 
           collections.Counter([j,i for i,j in myListOfTuples]))

在Python中使用顺序重要的元组？

3 个答案: