Question

我有一个列表：

[['A','B','1'],  
 ['A','D','2'],  
 ['F','B','1'],  
 ['K','B','1'],  
 ['M','D','2'],  
 ['G','H','3']  
]

我想只保留'column'2包含唯一值的行。更具体地说，新的“矩阵”应该只有最后两列。

结果：

[    
 ['B','1'],  
 ['D','2'],  
 ['H','3']  
]

有超过1.000.000行，第2列包含48位数的字符串，因此最好快速完成。

谢谢你，
汤姆

我试过了：

matrixData=[['A','B','1'],['A','D','2'],['F','B','1'],['K','B','1'],['M','D','2'],['G','H','3']]  
uniqueCol2=[]  
uniqueCol3=[]  
for line in matrixData:  
    if line[1] not in uniqueCol2:  
        uniqueCol2.append(line[1])  
        uniqueCol3.append(line[2])  
print uniqueCol2  
print uniqueCol3

结果

['B','D','H']  
['1','2','3']

这给了我两个列表，最后我需要uniqueCol3的总和，但由于有超过1.000.000行，可能因为字符串包含48位数，所以需要花费大量时间来检查if line[1] not in uniqueCol2:。

Answer 1

您可以尝试以下方式：

 def crop(input_matrix):
     output_matrix = []
     unique = set() # Tracks unique 2nd column entries
     for row in input_matrix:
         if row[1] not in unique: # If second column is unique, add the row slice to the output matrix
            output_matrix.append(row[1:2])
            unique.add(row[1]) # Add that value to unique entries we've found so far
     return output_matrix

set O（1）可以搜索，所以它的效率与您从该方面获得的效率相同。因此，总的复杂度是 O（n）和输入矩阵中的行数，我认为除非有一些信息可以用来预测，否则我认为这样有效。哪些行是非唯一的。

你绝对可以通过列表理解将代码高尔夫打造成两行代码，但我并不是为了清晰起见

二维列表，只保留列上具有唯一值的行

1 个答案: