我正在编写一个我将经常使用的脚本,使用不同大小的数据集,而且我必须进行一些我无法用Python直接进行的比较。
将有多个列表(大约20个或更多,但我已将它们减少到三个,例如和测试目的),所有列表都按照特定顺序具有相同数量的整数项。我想比较每个列表中相同位置的项目以找出差异。 对于定义数量的列表,这很容易:
a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
c = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
for x,y,z in zip(a,b,c):
if x != y != z:
print x, y, z
我已经尝试在函数中包装该循环,因此参数的数量可能会有所不同,但我遇到了问题。
def compare(*args):
for x in zip(args):
???
在最后的脚本中,我将没有多个单个列表,但是所有这些列表都列在一个列表中。那会有帮助吗?如果我遍历列表列表,我将不会立即得到每个列表...
忘记函数,它无论如何都不是很有用,因为它将成为更大脚本的一部分,并且定义不同的参数太困难了。 我现在一次比较两个列表,保存那些相同的列表。这样,我以后可以轻松地从我的整个列表中删除所有这些并保留唯一的那些。
l_o_l = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
for i in range(0, (len(l_o_l)-1)):
for j in range((i+1), len(l_o_l)):
if l_o_l[i] == l_o_l[j]:
duplicates.append(key_list[i])
duplicates.append(key_list[j])
dup = list(set(duplicates))
uniques = [x for x in key_list if x not in dup]
其中key_list包含字典中我的列表的标识符。
有任何改进建议吗?
答案 0 :(得分:2)
也许是这样的
def compare(*args):
for things in zip(*args):
yield all(x == things[0] for x in things)
然后你可以像这样使用它
a = range(10)
b = range(10)
c = range(10)
d = range(11, 20)
for match in compare(a,b,c):
print match
for match in compare(a,b,c,d):
print match
这是一个使用你的例子的演示(它是一个生成器,所以你必须通过它来使用list
消耗它)
a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
c = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
print list(compare(a,b,c))
答案 1 :(得分:1)
def compare(*args):
for x in zip(args):
values_list = list(x[0]) # x[0] because x is a tuple
different_values = set(values_list) # a set does not contain identical values
if len(different_values) != 1: # if you have more than 1 value you have different values in your list
print 'different values', values_list
给你
a = [0, 0, 1]
b = [0, 1, 1]
c = [1, 1, 1]
compare(a, b, c)
>>> different values [0, 0, 1]
>>> different values [0, 1, 1]
答案 2 :(得分:0)
def compare(elements):
return len(set(elements)) == bool(elements)
如果您想知道所有列表是否相同,您可以这样做:
all(compare(elements) for elements in zip(the_lists))
另一种方法是将list
转换为tuple
并在那里使用set
:
len(set(tuple(the_list) for the_list in the_lists) == bool(the_lists)
如果你只想删除重复项,这应该更快:
the_lists = [list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]
使用示例:
>>> a = range(100)
>>> b = range(100, 200)
>>> c = range(200, 300)
>>> d = a[:]
>>> e = b[:]
>>> the_lists = [a,b,c,d,e]
>>> the_lists2 = [list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]
>>> [a,b,c] == sorted(the_lists2) #order is not maintained by set
True
似乎很快:
>>> timeit.timeit('[list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]', 'from __main__ import the_lists', number=1000000)
7.949447154998779
执行100万次不到8秒。 (the_lists
与之前使用的相同。)
修改强>
如果您只想删除重复的list
,那么我能想到的最简单的算法是对列表列表进行排序并使用itertools.groupby
:
>>> a = range(100)
>>> b = range(100,200)
>>> c = range(200,300)
>>> d = a[:]
>>> e = b[:]
>>> the_lists = [a,b,c,d,e]
>>> the_lists.sort()
>>> import itertools as it
>>> for key, group in it.groupby(the_lists):
... if len(list(group)) == 1:
... print key
...
[200, 201, 202, ..., 297, 298, 299]
答案 3 :(得分:0)
假设列表与示例中的列表类似,我会使用:
def compare(*args):
for x in zip(args):
if min(x) != max(x):
print x
答案 4 :(得分:-1)
我认为尝试使用* args和zip变得聪明只会让问题变得混乱。我会写这样的东西:
def compare(list_of_lists):
# assuming not an empty data set
inner_len = len(list_of_lists[0])
for index in range(inner_len):
expected = list_of_lists[0][index]
for inner_list in list_of_lists:
if inner_list[index] != expected:
# report difference at this index