根据条件比较列表列表中的所有列表,并根据它们的差异将它们分组在一起

时间:2018-09-24 19:14:29

标签: python arrays python-3.x list nested-lists

我有以下列表列表:

a = [[1,2,3,4,5], [4,5,6,7,8], [1,2,3,4], [4,5,6,7,8,9], [2,3,4,5,6,7,8], [6,7,8,9], [5,6,7,8,9], [2,3,4,5,6], [3,4,5,6], [11,12,13,14,15], [13,14,15]]

用索引表示它们,以便于理解:

0 [1, 2, 3, 4, 5]
1 [4, 5, 6, 7, 8]
2 [1, 2, 3, 4]
3 [4, 5, 6, 7, 8, 9]
4 [2, 3, 4, 5, 6, 7, 8]
5 [6, 7, 8, 9]
6 [5, 6, 7, 8, 9]
7 [2, 3, 4, 5, 6]
8 [3, 4, 5, 6]
9 [11, 12, 13, 14, 15]
10 [13, 14, 15]

我期望输出将是如下所示的元组列表:

output = [(0,2,1), (3,1,1), (4,7,2), (4,1,2), (6,5,1), (3,5,2), (3,6,1), (7,8,1), (9,10,2)]

For example to explain first item of output i.e, (0,2,1):

0 ---> index of list under comparison with highest length 
2 ---> index of list under comparison with lowest length
1 ---> difference in length of the two lists 0 & 2

现在,问题来了:

我有一些清单,其中相似的项目在清单的开头或结尾处的长度分别为一和二(或三)。

我想对列表的索引进行排序,分组和标识,以及它们之间的区别为元组。

我经历了多个stackoverflow问题,但是找不到类似的问题。

我是python的新手,从下面的代码开始,并被卡住了:

a = sorted(a, key = len)

incr = [list(g) for k, g in groupby(a, key=len)]

decr = list(reversed(incr))

ndecr = [i for j in decr for i in j]

for i in range(len(ndecr)-1):
    if len(ndecr[i]) - len(ndecr[i+1]) == 1:
        print(ndecr[i])

for i in range(len(ndecr)-2):
    if len(ndecr[i]) - len(ndecr[i+2]) == 2:
        print(ndecr[i])

for i in ndecr:
    ele = i
    ndecr.remove(i)
    for j in ndecr:
        if ele[:-1] == j:
            print(j)   

for i in ndecr:
    ele = i
    ndecr.remove(i)
    for j in ndecr:
        if ele[:-2] == j:
            print(i)

请帮助我实现输出所需的方法。

3 个答案:

答案 0 :(得分:3)

IIUC,假设列表总数很小,所以len(lists)^ 2仍然很小,就像

library(knitr)
knit("AutomationPricingReport.Rmd", "AutomationPricingReport.docx")

给我

from itertools import combinations

# sort by length but preserve the index
ax = sorted(enumerate(a), key=lambda x: len(x[1]))

done = []

for (i0, seq0), (i1, seq1) in combinations(ax, 2):
    if seq1[:len(seq0)] == seq0 or seq1[-len(seq0):] == seq0:
       done.append((i1, i0, len(seq1)-len(seq0)))

与您的输出匹配,但与订单匹配,并且您实际上两次列出了(4,7,2)。

In [117]: sorted(done)
Out[117]: 
[(0, 2, 1),
 (3, 1, 1),
 (3, 5, 2),
 (3, 6, 1),
 (4, 1, 2),
 (4, 7, 2),
 (6, 5, 1),
 (7, 8, 1),
 (9, 10, 2)]

“ seq1是否以seq0开头?”条件,并且

seq1[:len(seq0)] == seq0 

“ seq1是否以seq0结尾?”条件。

答案 1 :(得分:1)

编辑(以下原始内容):

现在,我可能会更好地理解您(感谢@vash_the_stampede的澄清意见)。此方法嵌套了两个循环,以比较列表列表中的每个列表,并确定一个列表是否是另一个列表的子集。然后,如果比较列表是超集/子集,它将创建一个元组输出列表,每个元组包含两个顺序最长的两个比较列表的索引以及这些比较列表的长度差异。

重要提示:此方法不会比较列表顺序,因此它可能会提供您可能不希望的输出,例如[1,2,4,5][1,2,3,4,5]的子集,长度差为1。或者,特定于您的例如,与示例输出相比,此方法输出了一个额外的元组,因为索引8的[3,4,5,6]是索引4的[2,3,4,5,6,7,8]的子集,长度相差3。 @DSM的答案可以解决此问题,因此它可能更接近您的需求。

当前数据集的示例输出:

a = [[1,2,3,4,5], [4,5,6,7,8], [1,2,3,4], [4,5,6,7,8,9], [2,3,4,5,6,7,8], [6,7,8,9], [5,6,7,8,9], [2,3,4,5,6], [3,4,5,6], [11,12,13,14,15], [13,14,15]]

output = []
for i in range(len(a)):
    for j in range(i + 1, len(a)):
       if set(a[i]).issubset(a[j]) or set(a[i]).issuperset(a[j]):
           diff = abs(len(a[i]) - len(a[j]))
           if len(a[i]) > len(a[j]):
               output.append((i, j, diff))
           else:
               output.append((j, i, diff))

print(output)

# OUTPUT
# [(0, 2, 1), (3, 1, 1), (4, 1, 2), (3, 5, 2), (3, 6, 1), (4, 7, 2), (4, 8, 3), (6, 5, 1), (7, 8, 1), (9, 10, 2)]

原始:

如果我对您的理解正确,那么您可以嵌套几个循环以比较列表中的每个列表。然后,创建一个元组输出列表,每个元组包含两个比较列表的索引以及这些比较列表的长度差异。例如:

a = [[1,2,3,4,5], [4,5,6,7,8], [1,2,3,4], [4,5,6,7,8,9], [2,3,4,5,6,7,8], [6,7,8,9], [5,6,7,8,9], [2,3,4,5,6], [3,4,5,6], [11,12,13,14,15], [13,14,15]]

output = []
for i in range(len(a)):
    for j in range(i + 1, len(a)):
       diff = abs(len(a[i]) - len(a[j]))
       output.append((i, j, diff))

print(output)

# OUTPUT
# [(0, 1, 0), (0, 2, 1), (0, 3, 1), (0, 4, 2), (0, 5, 1), (0, 6, 0), (0, 7, 0), (0, 8, 1), (0, 9, 0), (0, 10, 2), (1, 2, 1), (1, 3, 1), (1, 4, 2), (1, 5, 1), (1, 6, 0), (1, 7, 0), (1, 8, 1), (1, 9, 0), (1, 10, 2), (2, 3, 2), (2, 4, 3), (2, 5, 0), (2, 6, 1), (2, 7, 1), (2, 8, 0), (2, 9, 1), (2, 10, 1), (3, 4, 1), (3, 5, 2), (3, 6, 1), (3, 7, 1), (3, 8, 2), (3, 9, 1), (3, 10, 3), (4, 5, 3), (4, 6, 2), (4, 7, 2), (4, 8, 3), (4, 9, 2), (4, 10, 4), (5, 6, 1), (5, 7, 1), (5, 8, 0), (5, 9, 1), (5, 10, 1), (6, 7, 0), (6, 8, 1), (6, 9, 0), (6, 10, 2), (7, 8, 1), (7, 9, 0), (7, 10, 2), (8, 9, 1), (8, 10, 1), (9, 10, 2)]

答案 2 :(得分:1)

嗯,我敢肯定,这样做的效率更高,但我要做的是创建原始列表的副本,在该副本中,每一项的一端都会缩短一到两个,然后将它们进行比较,然后将索引与相应的索引一起返回长度有所不同,它可以工作,但要缩小它的长度很大

ID_tb1
l1 = a[:]

tups = []
for idx, item in enumerate(l1):
    for x, i in enumerate(a):
        if sorted(item[:-1]) == sorted(i):
            tups.append((idx, x, 1))
        elif sorted(item[:-2]) == sorted(i):
            tups.append((idx, x, 2))
        elif sorted(item[1:]) == sorted(i):
            tups.append((idx, x, 1))
        elif sorted(item[2:]) == sorted(i):
            tups.append((idx, x, 2))

print(tups)