Question

我有一个字典列表，如下所示

Sub Main()

    Dim array(24) As Double, i As Long
    array(0) = 1

    For i = 1 To 24
        array(i) = 2 * array(i - 1)
    Next i

    'Calling the function here with the constant 3
    Dim results() As Double = CalculateResults(array, 3)

    Call DisplayArray(array)
    Call DisplayArray(results)

    Console.ReadLine()
End Sub

Sub DisplayArray(ByVal array() As Double)

    Dim i As Long, n As Long
    n = array.GetLength(0)

    For i = 0 To n - 1
        Console.WriteLine(array(i))
    Next i

End Sub

Function CalculateResults(ByVal array As Double(), ByVal k As Integer) As Double()

    Dim retVal(array.Length) As Double
    For index = 0 To array.Length
        retVal(index) = index ^ k
    Next
    Return retVal
End Function

它由近 7000000 个词典组成。然后我有一个字符串列表，例如

data = [{'Person1':['a', 'b', 'c']}, {'Person2':['1', '2', '3']}, {'Person3':['x', 'y', 'z']}]

长度为 450000。此列表中的所有字符串都作为字典列表中的键存在。

根据这个字符串列表过滤字典列表的最快/最有效的方法是什么，从而得到一个只包含与列表中的字符串对应的键的新字典，例如

people = ['person1', 'person3']

这是我的代码，但它需要很长时间才能运行，我想知道解决这个问题的最佳方法是什么。

d = {'Person1':['a', 'b', 'c']}, 'Person3':['x', 'y', 'z']}

Answer 1

使用 dict-comprehension 的第一个建议：

from collections import ChainMap

data = [
    {'Person1':['a', 'b', 'c']}, 
    {'Person2':['1', '2', '3']}, 
    {'Person3':['x', 'y', 'z']}
]
people = ['Person1', 'Person3']

big_dict = dict(ChainMap(*data))

# drop duplicates
people = list(set(people))

smaller_dict = {person: big_dict[person] for person in people}

对于 ChainMap，参见 here。我将 people 用作列表（而不是集合），因为 it has been reported 列表在这些情况下的执行速度稍快。

Answer 2

很少有基准测试 -

嵌套 For 循环

%%timeit

out = []
for i in data:
    for j in people:
        if list(i.keys())[0]==j:
            out.append(i)
            
#1.78 µs ± 55.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

使用 `in` 进行列表解析

%%timeit

out = [i for i in data if list(i.keys())[0] in people]

#1.02 µs ± 36.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

使用 `set.intersection` 进行列表解析

%%timeit

out = [i for i in data if set(i).intersection(people)]

#1.04 µs ± 27.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

根据另一个字符串列表过滤字典列表

2 个答案:

嵌套 For 循环

使用 `in` 进行列表解析

使用 `set.intersection` 进行列表解析

根据另一个字符串列表过滤字典列表

2 个答案:

嵌套 For 循环

使用 in 进行列表解析

使用 set.intersection 进行列表解析

使用 `in` 进行列表解析

使用 `set.intersection` 进行列表解析