我有两个嵌套的字符串列表:
listA = [["SomeString1", "A", "1"],
["SomeString2", "A", "2"],
["SomeString3", "B", "1"],
["SomeString4", "B", "2"]]
listB = [["OtherString1", "A", "1"],
["OtherString2", "A", "2"],
["OtherString3", "B", "1"],
["OtherString4", "B", "2"]]
对于A中的每个列表,我想在B中找到(sublistB[1] == sublistA[1]) and (sublistB[2] == sublistA[2])
(零索引)的列表。
然后我想附加' B'的第一个条目。子列表到' A'子列表,以便最终输出为:
joined = [["SomeString1", "A", "1", "OtherString1"],
["SomeString2", "A", "2", "OtherString2"],
["SomeString3", "B", "1", "OtherString3"],
["SomeString4", "B", "2", "OtherString4"]]
甚至更好,将条目插入位置1:
joined = [["SomeString1", "OtherString1", "A", "1"],
["SomeString2", "OtherString2", "A", "2"],
["SomeString3", "OtherString3", "B", "1"],
["SomeString4", "OtherString4", "B", "2"]]
在python中执行此操作的最佳方法是什么?我有一个实现,但有3个嵌套循环,这需要一些时间。
我觉得map
,filter
和/或reduce
可能有所帮助,但不确定如何实施?
请注意,列表不一定在我的示例中整齐排列。
此外,这非常重要 - 列表的长度可能不同,也不保证每个子列表都包含匹配项。如果找不到匹配项,我想附加无。
答案 0 :(得分:2)
使用字典来索引'来自listB
的字符串:
listBstrings = {tuple(lst[1:]): lst[0] for lst in listB}
这会将(listB[x][1], listB[x][2])
元组映射到listB[x][0]
个字符串。现在,您可以查看这些内容并在单个循环中生成joined
:
joined = [[lst[0], listBstrings[lst[1], lst[2]]] + lst[1:] for lst in listA]
如果listBstrings.get((lst[1], lst[2]), '')
中的两个元素永远不存在,您可能需要使用listB
来生成默认的空字符串。
总而言之,这需要线性时间O(N + M),其中N和M是输入列表长度。将此与嵌套循环方法进行比较,该方法需要O(N * M)二次时间。不同之处在于,两个10个元素的列表每个采用上述方法进行20次迭代,而嵌套循环解决方案则为100次,其中100个元素采用200次迭代,而嵌套采用10.000次迭代等。
演示:
>>> from pprint import pprint
>>> listA = [["SomeString1", "A", "1"],
... ["SomeString2", "A", "2"],
... ["SomeString3", "B", "1"],
... ["SomeString4", "B", "2"]]
>>> listB = [["OtherString1", "A", "1"],
... ["OtherString2", "A", "2"],
... ["OtherString3", "B", "1"],
... ["OtherString4", "B", "2"]]
>>> listBstrings = {tuple(lst[1:]): lst[0] for lst in listB}
>>> joined = [[lst[0], listBstrings[lst[1], lst[2]]] + lst[1:] for lst in listA]
>>> pprint(joined)
[['SomeString1', 'OtherString1', 'A', '1'],
['SomeString2', 'OtherString2', 'A', '2'],
['SomeString3', 'OtherString3', 'B', '1'],
['SomeString4', 'OtherString4', 'B', '2']]
答案 1 :(得分:0)
与@MartijnPieters类似的方法回答,但是使用了dict生成器:
from pprint import pprint
listA = [["SomeString1", "A", "1"],
["SomeString2", "A", "2"],
["SomeString3", "B", "1"],
["SomeString4", "B", "2"],
["SomeString5", "C", "1"]]
listB = [["OtherString1", "A", "1"],
["OtherString2", "A", "2"],
["OtherString3", "B", "1"],
["OtherString4", "B", "2"],
["OtherString5", "C", "2"]]
dictB = dict( ((x[1], x[2]), x[0]) for x in listB )
joined = [ [ a[0], dictB.get((a[1], a[2])), a[1], a[2] ] for a in listA ]
pprint(joined)
结果:
[['SomeString1', 'OtherString1', 'A', '1'],
['SomeString2', 'OtherString2', 'A', '2'],
['SomeString3', 'OtherString3', 'B', '1'],
['SomeString4', 'OtherString4', 'B', '2'],
['SomeString5', None, 'C', '1']]
我不确定使用dict生成器是否会导致更快的评估,但它可能会节省内存使用。
另一个变体是使用两个字典理解并迭代其中一个项目:
dictA = dict( ((x[1], x[2]), x[0]) for x in listA )
dictB = dict( ((x[1], x[2]), x[0]) for x in listB )
joined = [ [ v, dictB.get(k), k[0], k[1] ] for k, v in dictA.iteritems() ]
更多知识渊博的pythonistas可以评论这两种不同方法的利弊(或者我可能会发布另一个问题)。
答案 2 :(得分:0)
这是我对嵌套循环连接的实现。它需要两个列表以及另外两个列表,其中包含要连接的列的索引。例如:如果要将[1]加到b [2]和[2]加到b [3],那么参数就像这样: <强>加入(A,[1,2],B,[2,3])强>
listA = [["SomeString1", "A", "1"],
["SomeString2", "A", "2"],
["SomeString3", "B", "1"],
["SomeString4", "B", "2"]]
listB = [["OtherString1", "A", "1"],
["OtherString2", "A", "2"],
["OtherString3", "B", "1"],
["OtherString4", "B", "2"]]
def join(a,a_keys,b,b_keys):
joined = []
for i,a_rec in enumerate(a):
for j,b_rec in enumerate(b):
satisfies_keys = True
for l in range(0,len(a_keys)):
if a[i][a_keys[l]] != b[j][b_keys[l]]:
satisfies_keys = False
if satisfies_keys:
joined.append([a_rec, b_rec])
return joined
print(join(listA,[1,2],listB,[1,2]))