Question

我有一份清单清单：

x = [ [4, ‘c’, ‘b’, ‘d’], [2, ‘e’, ‘c’, ‘a’], [5, ‘a’, ‘c’] ]

我需要转换为：

x1 = [ [‘c’, 4, 2, 5], [‘b’, 4], [‘d’, 4], [‘e’, 2], [‘a’, 2, 5] ]

说明：

'c' appears in lists starting with 4, 2, 5
'b' appears in only the list starting with 4
'd' appears in only the list starting with 4
...

显然这是一个玩具示例，但我的真实列表在平面文件中大约有30 Mb。

我尝试使用两个嵌套的for循环但是我的MacBook Pro（8GB RAM）中只有5％的文件需要大约5个小时。

有没有一种有效的方法呢？

Answer 1

我还在两个嵌套循环中管理它：

from collections import defaultdict

x = [ [4, 'c', 'b', 'd'], [2, 'e', 'c', 'a'], [5, 'a', 'c'] ]

d = defaultdict(list)

for group in x:
    key = group[0]
    for item in group[1:]:
        d[item].append(key)


print(d)

# and to convert back to list:
x1 = [[key]+value for (key,value) in d.items()]
print(x1)

输出：

defaultdict(<class 'list'>, {'c': [4, 2, 5], 'b': [4], 'd': [4], 'e': [2], 'a': [2, 5]})
[['c', 4, 2, 5], ['b', 4], ['d', 4], ['e', 2], ['a', 2, 5]]

关于效率的说明：

在外环的内部，我计算group[1:]。现在，如果group很大，那么即使只是复制列表也可能很昂贵。如果是这样，循环可能会更好：

for group in x:
    it = iter(group)
    key = next(it)
    for item in it:
        d[item].append(key)

效率是O(n)，其中n是所有列表中的项目总数。无论是这种处理，还是读取30MB的文件内容都是最慢的，我都无法衡量。

Answer 2

基于@ quamrana对你实际想要完成的事情的假设：

x = [ [4, 'c', 'b', 'd'], 
      [2, 'e', 'c', 'a'], 
      [5, 'a', 'c'] ]

letters = {i for y in x for i in y if isinstance(i, str)}
y = [[i] + [sub[0] for sub in x if i in sub] for i in letters]
print(y)  # [['e', 2], ['d', 4], ['a', 2, 5], ['b', 4], ['c', 4, 2, 5]]

转置列表列表的有效方法

2 个答案: