根据长度和交集从列表列表中选择元素

时间:2018-05-31 07:17:35

标签: python list

l1 = [['a', 'b', 'c'],
      ['a', 'd', 'c'],
      ['a', 'e'],
      ['a', 'd', 'c'],
      ['a', 'f', 'c'],
      ['a', 'e'],
      ['p', 'q', 'r']]

l2 = [1, 1, 1, 2, 0, 0, 0]    

我有两个如上所示的列表。 l1是一个列表列表,l2是另一个列表,其中包含某种分数。

问题:对于l1中评分为0(来自l2)的所有列表,请找到完全不同或长度最短的列表。

例如:如果我的列表为[1, 2, 3][2, 3][5, 7]所有得分为0,我会选择[5, 7],因为这些元素不存在于任何其他元素中列表和[2, 3],因为它与[1, 2, 3]有一个交集,但长度较短。

我现在如何做到这一点:

l = [x for x, y in zip(l1, l2) if y == 0]
lx = [(x, y) for x, y in zip(l1, l2) if y > 0]
c = list(itertools.combinations(l, 2))

un_usable = []
usable = []
for i, j in c:
    intersection = len(set(i).intersection(set(j)))
    if intersection > 0:
        if len(i) < len(j):
            usable.append(i)
            un_usable.append(j)
        else:
            usable.append(j)
            un_usable.append(i)

for i, j in c:
    intersection = len(set(i).intersection(set(j)))
    if intersection == 0:
        if i not in un_usable and i not in usable:
            usable.append(i)
        if j not in un_usable and j not in usable:
            usable.append(j)            

final = lx + [(x, 0) for x in usable]

并且最终给了我:

[(['a', 'b', 'c'], 1),
 (['a', 'd', 'c'], 1),
 (['a', 'e'], 1),
 (['a', 'd', 'c'], 2),
 (['a', 'e'], 0),
 (['p', 'q', 'r'], 0)]

这是必需的结果。

编辑:处理相同的长度:

l1 = [['a', 'b', 'c'],
      ['a', 'd', 'c'],
      ['a', 'e'],
      ['a', 'd', 'c'],
      ['a', 'f', 'c'],
      ['a', 'e'],
      ['p', 'q', 'r'],
      ['a', 'k']]

l2 = [1, 1, 1, 2, 0, 0, 0, 0]     

l = [x for x, y in zip(l1, l2) if y == 0]
lx = [(x, y) for x, y in zip(l1, l2) if y > 0]
c = list(itertools.combinations(l, 2))
un_usable = []
usable = []
for i, j in c:
    intersection = len(set(i).intersection(set(j)))
    if intersection > 0:
        if len(i) < len(j):
            usable.append(i)
            un_usable.append(j)
        elif len(i) == len(j):
            usable.append(i)
            usable.append(j)
        else:
            usable.append(j)
            un_usable.append(i)

usable = [list(x) for x in set(tuple(x) for x in usable)]
un_usable = [list(x) for x in set(tuple(x) for x in un_usable)]

for i, j in c:
    intersection = len(set(i).intersection(set(j)))
    if intersection == 0:
        if i not in un_usable and i not in usable:
            usable.append(i)
        if j not in un_usable and j not in usable:
            usable.append(j)            

final = lx + [(x, 0) for x in usable]

有更好,更快和更好吗? pythonic实现相同的方式?

1 个答案:

答案 0 :(得分:1)

假设我理解正确,这是一个O(N)双遍算法。

步骤:

  1. 选择零得分列表。
  2. 对于每个零分数列表的每个元素,找到元素出现的最短零分数列表的长度。我们称之为元素的长度分数。
  3. 对于每个列表,找到列表中所有元素的最小长度分数。如果结果小于列表的长度,则丢弃该列表。
  4. def select_lsts(lsts, scores):
        # pick out zero score lists
        z_lsts = [lst for lst, score in zip(lsts, scores) if score == 0]
    
        # keep track of the shortest length of any list in which an element occurs
        len_shortest = dict()
        for lst in z_lsts:
            ln = len(lst)
            for c in lst:
                len_shortest[c] = min(ln, len_shortest.get(c, float('inf')))
    
        # check if the list is of minimum length for each of its chars
        for lst in z_lsts:
            len_lst = len(lst)
            if any(len_shortest[c] < len_lst for c in lst):
                continue
    
            yield lst