如果我有dict
个列表,例如:
{
'id1': ['a', 'b', 'c'],
'id2': ['a', 'b'],
# etc.
}
我希望计算列表的大小,即ids的数量> 0,> 1,> 2 ...等等
是否有比嵌套for循环更简单的方法:
dictOfOutputs = {}
for x in range(1,11):
count = 0
for agentId in userIdDict:
if len(userIdDict[agentId]) > x:
count += 1
dictOfOutputs[x] = count
return dictOfOutputs
答案 0 :(得分:2)
我使用collections.Counter()
object来收集长度,然后积累总和:
from collections import Counter
lengths = Counter(len(v) for v in userIdDict.values())
total = 0
accumulated = {}
for length in range(max(lengths), -1, -1):
count = lengths.get(length, 0)
total += count
accumulated[length] = total
因此,这会收集每个长度的计数,然后构建一个累积长度的字典。这是一个O(N)算法;你遍历所有的值一次,然后添加一些较小的直线循环(对于max()
和累积循环):
>>> from collections import Counter
>>> import random
>>> testdata = {''.join(random.choice('abcdefghijklmnopqrstuvwxyz') for _ in range(5)): [None] * random.randint(1, 10) for _ in range(100)}
>>> lengths = Counter(len(v) for v in testdata.values())
>>> lengths
Counter({8: 14, 7: 13, 2: 11, 3: 10, 4: 9, 5: 9, 9: 9, 10: 9, 1: 8, 6: 8})
>>> total = 0
>>> accumulated = {}
>>> for length in range(max(lengths), -1, -1):
... count = lengths.get(length, 0)
... total += count
... accumulated[length] = total
...
>>> accumulated
{0: 100, 1: 100, 2: 92, 3: 81, 4: 71, 5: 62, 6: 53, 7: 45, 8: 32, 9: 18, 10: 9}
答案 1 :(得分:0)
是的,还有更好的方法。
首先,按照数据的长度索引ID:
my_dict = {
'id1': ['a', 'b', 'c'],
'id2': ['a', 'b'],
}
from collections import defaultdict
ids_by_data_len = defaultdict(list)
for id, data in my_dict.items():
my_dict[len(data)].append(id)
现在,创建你的词典:
output_dict = {}
accumulator = 0
# note: the end of a range is non-inclusive!
for data_len in reversed(range(1, max(ids_by_data_len.keys()) + 1):
accumulator += len(ids_by_data_len.get(data_len, []))
output_dict[data_len-1] = accumulator
这具有O(n)复杂度而不是O(n²),因此对于大型数据集来说它也快得多。