Python连接两个嵌套列表

时间:2014-11-03 16:55:04

标签: python list join filtering

我有两个嵌套的字符串列表:

listA = [["SomeString1", "A", "1"],
         ["SomeString2", "A", "2"],
         ["SomeString3", "B", "1"],
         ["SomeString4", "B", "2"]]


listB = [["OtherString1", "A", "1"],
         ["OtherString2", "A", "2"],
         ["OtherString3", "B", "1"],
         ["OtherString4", "B", "2"]]

对于A中的每个列表,我想在B中找到(sublistB[1] == sublistA[1]) and (sublistB[2] == sublistA[2])(零索引)的列表。

然后我想附加' B'的第一个条目。子列表到' A'子列表,以便最终输出为:

joined = [["SomeString1", "A", "1", "OtherString1"],
         ["SomeString2", "A", "2", "OtherString2"],
         ["SomeString3", "B", "1", "OtherString3"],
         ["SomeString4", "B", "2", "OtherString4"]]

甚至更好,将条目插入位置1:

joined = [["SomeString1", "OtherString1", "A", "1"],
         ["SomeString2", "OtherString2", "A", "2"],
         ["SomeString3", "OtherString3", "B", "1"],
         ["SomeString4", "OtherString4", "B", "2"]]

在python中执行此操作的最佳方法是什么?我有一个实现,但有3个嵌套循环,这需要一些时间。 我觉得mapfilter和/或reduce可能有所帮助,但不确定如何实施?

请注意,列表不一定在我的示例中整齐排列。

此外,这非常重要 - 列表的长度可能不同,也不保证每个子列表都包含匹配项。如果找不到匹配项,我想附加无。

3 个答案:

答案 0 :(得分:2)

使用字典来索引'来自listB的字符串:

listBstrings = {tuple(lst[1:]): lst[0] for lst in listB}

这会将(listB[x][1], listB[x][2])元组映射到listB[x][0]个字符串。现在,您可以查看这些内容并在单个循环中生成joined

joined = [[lst[0], listBstrings[lst[1], lst[2]]] + lst[1:] for lst in listA]

如果listBstrings.get((lst[1], lst[2]), '')中的两个元素永远不存在,您可能需要使用listB来生成默认的空字符串。

总而言之,这需要线性时间O(N + M),其中N和M是输入列表长度。将此与嵌套循环方法进行比较,该方法需要O(N * M)二次时间。不同之处在于,两个10个元素的列表每个采用上述方法进行20次迭代,而嵌套循环解决方案则为100次,其中100个元素采用200次迭代,而嵌套采用10.000次迭代等。

演示:

>>> from pprint import pprint
>>> listA = [["SomeString1", "A", "1"],
...          ["SomeString2", "A", "2"],
...          ["SomeString3", "B", "1"],
...          ["SomeString4", "B", "2"]]
>>> listB = [["OtherString1", "A", "1"],
...          ["OtherString2", "A", "2"],
...          ["OtherString3", "B", "1"],
...          ["OtherString4", "B", "2"]]
>>> listBstrings = {tuple(lst[1:]): lst[0] for lst in listB}
>>> joined = [[lst[0], listBstrings[lst[1], lst[2]]] + lst[1:] for lst in listA]
>>> pprint(joined)
[['SomeString1', 'OtherString1', 'A', '1'],
 ['SomeString2', 'OtherString2', 'A', '2'],
 ['SomeString3', 'OtherString3', 'B', '1'],
 ['SomeString4', 'OtherString4', 'B', '2']]

答案 1 :(得分:0)

与@MartijnPieters类似的方法回答,但是使用了dict生成器:

from pprint import pprint
listA = [["SomeString1", "A", "1"],
         ["SomeString2", "A", "2"],
         ["SomeString3", "B", "1"],
         ["SomeString4", "B", "2"],
         ["SomeString5", "C", "1"]]
listB = [["OtherString1", "A", "1"],
         ["OtherString2", "A", "2"],
         ["OtherString3", "B", "1"],
         ["OtherString4", "B", "2"], 
         ["OtherString5", "C", "2"]]
dictB = dict( ((x[1], x[2]), x[0]) for x in listB )
joined = [ [ a[0], dictB.get((a[1], a[2])), a[1], a[2] ] for a in listA ]
pprint(joined)

结果:

[['SomeString1', 'OtherString1', 'A', '1'],
 ['SomeString2', 'OtherString2', 'A', '2'],
 ['SomeString3', 'OtherString3', 'B', '1'],
 ['SomeString4', 'OtherString4', 'B', '2'],
 ['SomeString5', None, 'C', '1']]

我不确定使用dict生成器是否会导致更快的评估,但它可能会节省内存使用。


另一个变体是使用两个字典理解并迭代其中一个项目:

dictA = dict( ((x[1], x[2]), x[0]) for x in listA )
dictB = dict( ((x[1], x[2]), x[0]) for x in listB )
joined = [ [ v, dictB.get(k), k[0], k[1] ] for k, v in dictA.iteritems() ]

更多知识渊博的pythonistas可以评论这两种不同方法的利弊(或者我可能会发布另一个问题)。

答案 2 :(得分:0)

这是我对嵌套循环连接的实现。它需要两个列表以及另外两个列表,其中包含要连接的列的索引。例如:如果要将[1]加到b [2]和[2]加到b [3],那么参数就像这样: <强>加入(A,[1,2],B,[2,3])

listA = [["SomeString1", "A", "1"],
         ["SomeString2", "A", "2"],
         ["SomeString3", "B", "1"],
         ["SomeString4", "B", "2"]]


listB = [["OtherString1", "A", "1"],
         ["OtherString2", "A", "2"],
         ["OtherString3", "B", "1"],
         ["OtherString4", "B", "2"]]

def join(a,a_keys,b,b_keys):
    joined = []
    for i,a_rec in enumerate(a):
        for j,b_rec in enumerate(b):
            satisfies_keys = True
            for l in range(0,len(a_keys)):
                if a[i][a_keys[l]] != b[j][b_keys[l]]:
                    satisfies_keys = False
            if satisfies_keys:
                joined.append([a_rec, b_rec])
    return joined

print(join(listA,[1,2],listB,[1,2]))