比较两个列表并返回不匹配的项目

时间:2014-06-30 18:33:00

标签: python list compare

我有两个清单:

nodes = [[nodeID1, x1, y1, z1],[nodeID2, x2, y2, z2],...,[nodeIDn, xn, yn, zn]]

subsetA_nodeID = [[nodeIDa], [nodeIDb], ....]]

我想比较这两个列表并返回一个nodeIDs, x, y, z nodes nodeIDs的新列表,该列表与subsetA_nodeID的{​​{1}}不匹配。

我可以这样做:

new_list = []
for line in nodes:
   for nodeID,x,y,z in line:
      for line2 in subsetA_nodeID:
         if line2[0] == nodeID:
         else:
            new_list.append([line])

这段代码效率很低。我正在寻找一种快速的方法来做到这一点。我尝试过字典,但我无法正确使用它们。有什么想法吗?

谢谢!

4 个答案:

答案 0 :(得分:3)

我建议首先压扁subsetA_nodeID

ssa_flat = [x for sublist in subsetA_nodeID for x in sublist] 

或者,如果subsetA_nodeID中的每个子列表都保证只包含一个元素:

ssa_flat = [x[0] for x in subsetA_nodeID]

如果节点是可以播放的,请考虑将ssa_flat设为set

ssa_flat = set(ssa_flat)

然后您可以像这样创建新列表:

lst = [x[0] for x in nodes if x[0] not in ssa_flat]

修改:如果lst应包含[NodeID, x, y, z]列表,只需将最后一个x[0]更改为x

答案 1 :(得分:2)

numpy是你这样的朋友......

import itertools,numpy

a = numpy.array(nodes)
list_of_ids = itertools.chain(*subsetA_nodeID) # flatten
mask = ~numpy.in1d(a[:,1],list_of_ids) # intersection negated
print a[mask] # show the rows that match this condition

我还建议设置list_of_ids一套,因为设置查找速度要快得多(numpy可能已经在幕后做了......不确定)

答案 2 :(得分:2)

您可以尝试使用列表推导来全面浏览它们:

new_list = [node for node in nodes if node[0] not in subsetA_nodeID]

虽然我不确定这与其他答案相比有多高效。如另一个答案所述,您可能需要将subsetA_nodeID展平为一维列表才能使其生效。

答案 3 :(得分:2)

迭代虽然整个事情对于大问题可能不是一个好主意,除了@ JoranBeasley的建议,pandas也是另一种选择:

In [52]:
import pandas as pd
nodes = [['nodeID1', 'x1', 'y1', 'z1'],['nodeID2', 'x2', 'y2', 'z2'],['nodeIDn', 'xn', 'yn', 'zn']]
subsetA_nodeID = [['nodeID1'], ['nodeID2']]
subsetA_nodeIDa = ['nodeID1', 'nodeID2'] #use itertools.chain to get this
In [53]:

df=pd.DataFrame(nodes)
print df
df.set_index(0, inplace=True)
print df
         0   1   2   3
0  nodeID1  x1  y1  z1
1  nodeID2  x2  y2  z2
2  nodeIDn  xn  yn  zn
          1   2   3
0                  
nodeID1  x1  y1  z1
nodeID2  x2  y2  z2
nodeIDn  xn  yn  zn
In [54]:

print df.ix[subsetA_nodeIDa]
          1   2   3
nodeID1  x1  y1  z1
nodeID2  x2  y2  z2
In [55]:

list(map(list, df.ix[subsetA_nodeIDa].values))
Out[55]:
[['x1', 'y1', 'z1'], ['x2', 'y2', 'z2']]