我有一份清单清单:
x = [ [4, ‘c’, ‘b’, ‘d’], [2, ‘e’, ‘c’, ‘a’], [5, ‘a’, ‘c’] ]
我需要转换为:
x1 = [ [‘c’, 4, 2, 5], [‘b’, 4], [‘d’, 4], [‘e’, 2], [‘a’, 2, 5] ]
说明:
'c' appears in lists starting with 4, 2, 5
'b' appears in only the list starting with 4
'd' appears in only the list starting with 4
...
显然这是一个玩具示例,但我的真实列表在平面文件中大约有30 Mb。
我尝试使用两个嵌套的for循环但是我的MacBook Pro(8GB RAM)中只有5%的文件需要大约5个小时。
有没有一种有效的方法呢?
答案 0 :(得分:3)
我还在两个嵌套循环中管理它:
from collections import defaultdict
x = [ [4, 'c', 'b', 'd'], [2, 'e', 'c', 'a'], [5, 'a', 'c'] ]
d = defaultdict(list)
for group in x:
key = group[0]
for item in group[1:]:
d[item].append(key)
print(d)
# and to convert back to list:
x1 = [[key]+value for (key,value) in d.items()]
print(x1)
输出:
defaultdict(<class 'list'>, {'c': [4, 2, 5], 'b': [4], 'd': [4], 'e': [2], 'a': [2, 5]})
[['c', 4, 2, 5], ['b', 4], ['d', 4], ['e', 2], ['a', 2, 5]]
关于效率的说明:
在外环的内部,我计算group[1:]
。现在,如果group
很大,那么即使只是复制列表也可能很昂贵。如果是这样,循环可能会更好:
for group in x:
it = iter(group)
key = next(it)
for item in it:
d[item].append(key)
效率是O(n)
,其中n是所有列表中的项目总数。无论是这种处理,还是读取30MB的文件内容都是最慢的,我都无法衡量。
答案 1 :(得分:1)
基于@ quamrana对你实际想要完成的事情的假设:
x = [ [4, 'c', 'b', 'd'],
[2, 'e', 'c', 'a'],
[5, 'a', 'c'] ]
letters = {i for y in x for i in y if isinstance(i, str)}
y = [[i] + [sub[0] for sub in x if i in sub] for i in letters]
print(y) # [['e', 2], ['d', 4], ['a', 2, 5], ['b', 4], ['c', 4, 2, 5]]