Question

所以请看以下示例列表

l = [
    {
        'post':1,
        'user':1,
        'other_stuff':'something',
        'more':'you get the point'
    },
    {
        'post':1,
        'user':2,
        'other_stuff':'something',
        'more':'you get the point'
    },
    {
        'post':2,
        'user':1,
        'other_stuff':'something',
        'more':'you get the point'
    },
]

我需要能够检查'user'是否已经连接到'post'，我可以通过循环来完成：

user = 1
post = 1
response = False
for connection in l:
    if connection['post'] == post and connection['user'] == user:
        response = True
        break

这非常有效。问题在于，在实际情况下，l将填充150万次，并且每次填充时都会运行此迭代，因为它需要检查是否已存在某些内容。所以最后的500k迭代将迭代超过100万个字典的列表。 这是最有效的方法！！我的问题是：什么是不需要这种排气的最佳方法？

注意：我不一定知道词典中其他键的值，因此我无法if x is in l检查

Answer 1

我会重新考虑如何布置数据结构。如果您需要对post和user对进行有效访问，我会考虑将其存储为以下格式：

l = { (1, 1) : {'other stuff':'something', ...}, 
      (1, 2) : {'other stuff':'something', ...},
      (2, 1) : {'other stuff':'something', ...} }

然后这变为O(1)查找：

user_post_pair = (1, 1)
if user_post_pair in l:
    # Stuff...

python优化列表搜索

1 个答案: