Question

说我有一个我正在处理的传入项目流。对于每个项目，我提取一些数据并存储它。但很多项目都是一样的。我想跟踪接收它们，但不能多次存储相同的数据。我可以像这样实现它，但它看起来很笨重：

item_cache = {}
item_record = []

def process(input_item):
    item = Item(input_item)  # implements __hash__
    try:
        item_record.append(item_cache[item])
    except KeyError:
        item_cache[item] = item  # this is the part that seems weird
        item_record.append(item)

我只是在思考这个？在python中做d[thing] = thing是一个相当正常的构造吗？

修改

回应以下评论。这是一个更完整的示例，显示了此代码如何避免存储输入数据的重复副本。

class Item(object):
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

    def __eq__(self, other):
        return self.a == other.a and self.b == other.b and self.c == other.c

    def __ne__(self, other):
        return not (self == other)

    def __hash__(self):
        return hash((self.a, self.b, self.c))

    def __repr__(self):
        return '(%s, %s, %s)' % (self.a, self.b, self.c)


item_cache = {}
item_record = []


def process_item(new_item):
    item = Item(*new_item)
    try:
        item_record.append(item_cache[item])
    except KeyError:
        item_cache[item] = item
        item_record.append(item)

    del item  # this happens anyway, just adding for clarity.

for item in ((1, 2, 3), (2, 3, 4), (1, 2, 3), (2, 3, 4)):
    process_item(item)

print([id(item) for item in item_record])
print(item_record)

Answer 1

不幸的是，是的。实际上有点过分思考。您需要做的就是使用sets

set对象是不同的hashable对象的无序集合。常见用途包括成员资格测试，从中删除重复项序列，并计算数学运算，如交集，联合，差异和对称差异。

您的代码可以替换为

item_record = set()
for .... :
   item_record.add(input_item)

<强>更新虽然你说＆＃34;，但不是多次存储相同的数据＆＃34;您的代码实际上存储了多个项目。在原始代码中，无论项目缓存中是否存在项目，都将执行item_record.append（）调用

try:
    item_record.append(item_cache[item])
except KeyError:
    item_cache[item] = item  # this is the part that seems weird
    item_record.append(item)

所以列表会有重复。但是我不确定您是否附加了正确的对象，因为您还没有共享Item类的代码。我相信我们真正拥有的是xy problem。为什么不发布一个新问题并解释你想要解决的问题。

在字典/ hashmap（在python中）中使用对象作为自己的键是否正常？

1 个答案: