在python中找到两个列表列表之间的最常用元素的最快方法

时间:2018-01-31 07:30:39

标签: python

我有一个如下列表。

mylist = 
[  
   [  
      [  
         "chocolate_pudding",
         920.8000000000001
      ],
      [  
         "caramel_pudding",
         345.59999999999997
      ],
      [  
         "pudding",
         248.0
      ],
      [  
         "banana_pudding",
         27.599999999999998
      ]
   ],
   [  
      [  
         "biscuits",
         190.8
      ],
      [  
         "chocolates",
         33.599999999999994
      ],
      [  
         "chocolate_pudding",
         920.8000000000001
      ]
   ],
   [  
      [  
         "tiramusu",
         145.8
      ]
   ],
   [  
      [  
         "cakes",
         139.29999999999998
      ]
   ],
   [  
      [  
         "butter_cakes",
         133.0
      ]
   ],
   [  
      [  
         "chocolate_pudding",
         920.8000000000001
      ]
   ]
]

我想找到在列表中出现多次的元素(例如["chocolate_pudding", 920.8000000000001]),并希望删除重复的元素,同时保留第一个条目。

所以,我的输出应该如下所示。

mylist = 
[  
   [  
      [  
         "chocolate_pudding",
         920.8000000000001
      ],
      [  
         "caramel_pudding",
         345.59999999999997
      ],
      [  
         "pudding",
         248.0
      ],
      [  
         "banana_pudding",
         27.599999999999998
      ]
   ],
   [  
      [  
         "biscuits",
         190.8
      ],
      [  
         "chocolates",
         33.599999999999994
      ]
   ],
   [  
      [  
         "tiramusu",
         145.8
      ]
   ],
   [  
      [  
         "cakes",
         139.29999999999998
      ]
   ],
   [  
      [  
         "butter_cakes",
         133.0
      ]
   ]
]

我一直在尝试的代码如下。

mylist_copy = mylist

for item in mylist:
    myindex = mylist.index(item)
    #print(item)

    for single_item in item:
        #print(single_item)
        for item_copy in mylist_copy:
            if mylist_copy.index(item_copy) != myindex:
                if single_item in item_copy:
                    print(single_item)

因为它有许多for循环,我想要一种有效的方法。注意:我也试过了;

mylist_copy = mylist

for item in mylist:
    myindex = mylist.index(item)
    for item_copy in mylist_copy:
          if mylist_copy.index(item_copy) != myindex:
                print(set(item).intersection(item_copy))

但是,十字路口不支持列表。

在python中有一种简单快捷的方法吗?

3 个答案:

答案 0 :(得分:2)

使用set()对象并保留子列表的顺序:

mylist = [[["chocolate_pudding", 920.8000000000001], ["caramel_pudding", 345.59999999999997], 
          ["pudding", 248.0], ["banana_pudding", 27.599999999999998]], [["biscuits", 190.8], 
          ["chocolates", 33.599999999999994], ["chocolate_pudding", 920.8000000000001]], 
          [["tiramusu", 145.8]], [["cakes", 139.29999999999998]], [["butter_cakes", 133.0]], 
          [["chocolate_pudding", 920.8000000000001]]]

result, foods = [], set()
for sub_l in mylist:
    new_sublist = []
    for i in sub_l:
        if i[0] not in foods:     # on the 1st occurrence of `foodstuff` name
            new_sublist.append(i)
            foods.add(i[0])       # add `foodstuff` into set of unique foods
    if new_sublist: result.append(new_sublist)

print(result)

输出:

[[['chocolate_pudding', 920.8000000000001], ['caramel_pudding', 345.59999999999997], ['pudding', 248.0], ['banana_pudding', 27.599999999999998]], [['biscuits', 190.8], ['chocolates', 33.599999999999994]], [['tiramusu', 145.8]], [['cakes', 139.29999999999998]], [['butter_cakes', 133.0]]]

答案 1 :(得分:1)

您可以展开内部列表并将它们全部放在一个集合中。套装可能不包含重复项,因此您甚至不必检查它,该套装会在很短的时间内为您完成。唯一需要注意的是,一个集合不能包含列表,因此需要先将它们转换为元组。如果您对这两种类型转换没有问题,可以通过简单的集合理解来完成,并且应该相当快:

no_duplicates = {tuple(inner) for outer in mylist for inner in outer}

或者您之后更改类型:

no_dupe_lists = list(map(list, no_duplicates))

您没有要求这样做,但如果您要复制列表,则必须使用其中一种正确的复制技巧:mylist_copy = list(mylist)mylist_copy = mylist[:]mylist_copy = [element for element in mylist] ,第一个是推荐的。

由于您的列表包含嵌套列表,因此需要复制这些列表:

mylist_copy = [[list(inner) for inner in outer] for outer in mylist]

答案 2 :(得分:1)

一旦一个伟大的人说,只拿你想要的,为什么要删除?现在有两个人说:

mylist = [[["chocolate_pudding", 920.8000000000001], ["caramel_pudding", 345.59999999999997],
          ["pudding", 248.0], ["banana_pudding", 27.599999999999998]], [["biscuits", 190.8],
          ["chocolates", 33.599999999999994], ["chocolate_pudding", 920.8000000000001]],
          [["tiramusu", 145.8]], [["cakes", 139.29999999999998]], [["butter_cakes", 133.0]],
          [["chocolate_pudding", 920.8000000000001]]]


result=[]
track=[]
for i in mylist:
    sublist=[]
    for k in i:
        if k not in track:
            track.append(k)
            sublist.append(k)

    if sublist:

        result.append(sublist)


print(result)

输出:

[[['chocolate_pudding', 920.8000000000001], ['caramel_pudding', 345.59999999999997], ['pudding', 248.0], ['banana_pudding', 27.599999999999998]], [['biscuits', 190.8], ['chocolates', 33.599999999999994]], [['tiramusu', 145.8]], [['cakes', 139.29999999999998]], [['butter_cakes', 133.0]]]