在嵌套列表中查找重复的元素

时间:2018-09-17 14:43:42

标签: python list nested-lists

我有一个嵌套的元素列表:

employee_list =  [
    ['Name', '=', 'John'],
    ['Age', '=', '32'],
    ['Weight', '=', '60'],
    ['Name', '=', 'Steve'],
    ['Weight', '=', '85']
]

我想创建两个元素列表:一个包含重复的元素,另一个包含唯一的元素。但是我也希望保持重复

unique_list = [['Age', '=', '32']]

repeated_list = [
    ['Name', '=', 'John'],
    ['Weight', '=', '60'],
    ['Name', '=', 'Steve'],
    ['Weight', '=', '85']
] 

唯一性或重复性由每个子列表的第一个元素确定。例如:'Name''Weight'。如果有两个子列表,其中第一个元素是'Name',我认为它是重复的。

有人可以建议一种简单的方法吗?

5 个答案:

答案 0 :(得分:5)

您可以使用collections.Counter并根据重要的前几个元素的数量来理解两个列表:

tell application "System Preferences"
    activate
    reveal pane id "com.apple.preference.universalaccess"
end tell

tell application "System Events"
    tell process "System Preferences"
        repeat until exists window "Accessibility"
            delay 0.1
        end repeat
        tell window "Accessibility"
            try
                select (first row of table 1 of scroll area 2 whose name of UI element 1 is "Display")
                click checkbox "Use grayscale"
            end try
        end tell
    end tell
end tell
quit application "System Preferences"

更新:按“键”分隔from collections import Counter c = Counter(l[0] for l in employee_list) # Counter({'Name': 2, 'Weight': 2, 'Age': 1}) uniq = [l for l in employee_list if c[l[0]] == 1] # [['Age', '=', '32']] rept = [l for l in employee_list if c[l[0]] > 1] # [['Name', '=', 'John'], # ['Weight', '=', '60'], # ['Name', '=', 'Steve'], # ['Weight', '=', '85']]

rept

答案 1 :(得分:0)

您不能使用列表列表来做Counter,它将返回

  

不可散列的类型:“列表”

因此我们需要转换为list中的tuple

employee_tuple=list(map(tuple,employee_list))
# then we using Counter    
from collections import Counter
d=Counter(employee_tuple)

l=list(map(d.get,employee_tuple))# get the freq of each item
l
Out[372]: [2, 1, 2, 2, 2]

# then we using filter 
from itertools import compress
list(compress(employee_list, map(lambda x: x == 1, l)))
Out[380]: [['Age', '=', '32']]


list(compress(employee_list, map(lambda x: x != 1, l)))
Out[384]: 
[['Name', '=', 'John'],
 ['Weight', '=', '60'],
 ['Name', '=', 'John'],
 ['Weight', '=', '60']]

答案 2 :(得分:0)

您可以使用多种解决方案,包括列表推导和过滤器。您还可以使用集合和列表来生成元素的唯一集合,并转换回列表,如link provided by benvc中所示 然后,在获得唯一元素列表之后,您可以从原始列表中过滤这些元素,以得到结果的重复列表(如果有的话)

请参见python tips on filter

答案 3 :(得分:0)

如果您创建的test_list包含items中的所有employee_list,则可以使用内置的count方法和count的外观employee_list[i][0]中的list,如果count == 1,则我们将整个item附加到我们的unique_list

employee_list =  [
    ['Name', '=', 'John'],
    ['Age', '=', '32'],
    ['Weight', '=', '60'],
    ['Name', '=', 'Steve'],
    ['Weight', '=', '85']
]

unique_list = []
repeated_list = [] 
test_list = []

for i in employee_list:
    for j in i:
        test_list.append(j)

for i in employee_list:
    if test_list.count(i[0]) == 1:
        unique_list.append(i)
    else:
        repeated_list.append(i)

print(f"Repeated: {repeated_list}")
print(f"Unique: {unique_list}")
(xenial)vash@localhost:~/python/stack_overflow$ python3.7 unique.py 
Repeated: [['Name', '=', 'John'], ['Weight', '=', '60'], ['Name', '=', 'Steve'], ['Weight', '=', '85']]
Unique: [['Age', '=', '32']]

答案 4 :(得分:0)

我使用纯numpy解决方案(我又增加了一行以使其更通用):

让我们说这是我们的数据:

data = np.array(data).astype(str)

data: array([['Name', '=', 'John'],
       ['Age', '_', '32'],
       ['Weight', '=', '60'],
       ['Name', '=', 'John'],
       ['Weight', '=', '60'],
       ['TT', '=', 'EE']], dtype='<U6')

下一步是获取唯一行:

uniq = np.unique(data, axis=0)
uniq: array([['Age', '_', '32'],
       ['Name', '=', 'John'],
       ['TT', '=', 'EE'],
       ['Weight', '=', '60']], dtype='<U6')

现在,我们想看看哪些行没有重复多次: (仅一次回答:)

only_once = np.array([row for row in uniq if sum(np.all(row==data, axis=1)) == 1])
only_once:
array([['Age', '_', '32'],
       ['TT', '=', 'EE']], dtype='<U6')

为了获得重复的索引:

idx = []
for row in only_once:
    lst = np.all(data==row, axis=1)
    idx = np.where(lst)[0]
    idx.append(idx)
idx:
[array([1]), array([5])]

唯一重复值的矩阵:

result = np.delete(data, idx, axis=0)
result:
array([['Name', '=', 'John'],
       ['Weight', '=', '60'],
       ['Name', '=', 'John'],
       ['Weight', '=', '60']], dtype='<U6')