我有一个数据帧列表,在列表中的每个下一个数据帧中都有数据重复,我需要在它们之间相减
the_list[0] = [1, 2, 3]
the_list[1] = [1, 2, 3, 4, 5, 6, 7]
还有df标头。数据框仅在行数上有所不同。
想要的解决方案:
the_list[0] = [1, 2, 3]
the_list[1] = [4, 5, 6, 7]
由于我的列表the_list
包含多个数据帧,因此我必须向后进行操作,从最后一个df到第一个df并保持第一个完整。
我当前的代码(estwin是the_list):
estwin = [df1, df2, df3, df4]
output=([])
estwin.reverse()
for i in range(len(estwin) -1):
difference = Diff(estwin[i], estwin[i+1])
output.append(difference)
return(output)
def Diff(li_bigger, li_smaller):
c = [x for x in li_bigger if x not in li_smaller]
return (c)
当前,结果为空列表。我需要一个更新的the_list
,其中仅包含差异(列表之间没有重复的值)。
答案 0 :(得分:1)
您的代码不可运行,但是,如果您猜想写什么,它会起作用,只是您的算法中有一个错误:
the_list = [
[1, 2, 3],
[1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7, 8, 9]
]
def process(lists):
output = []
lists.reverse()
for i in range(len(lists)-1):
difference = diff(lists[i], lists[i+1])
output.append(difference)
# BUGFIX: Always add first list (now last becuase of reverse)
output.append(lists[-1])
output.reverse()
return output
def diff(li_bigger, li_smaller):
return [x for x in li_bigger if x not in li_smaller]
print(the_list)
print(process(the_list))
输出:
[[1, 2, 3], [1, 2, 3, 4, 5, 6, 7], [1, 2, 3, 4, 5, 6, 7, 8, 9]]
[[1, 2, 3], [4, 5, 6, 7], [8, 9]]
答案 1 :(得分:1)
对于该问题,您不需要倒退,更容易跟踪已看到的情况。
遍历每个列表时,请保留一组随着新项目而更新的集合,并使用它来过滤输出中应该存在的项目。
list1 = [1,2,3]
list2 = [1,2,3,4,5,6,7]
estwin = [list1, list2]
lookup = set() #to check which items/numbers have already been seen.
output = []
for lst in estwin:
updated_lst = [i for i in lst if i not in lookup] #only new items present
lookup.update(updated_lst)
output.append(updated_lst)
print(output) #[[1, 2, 3], [4, 5, 6, 7]]
答案 2 :(得分:0)
单线:
from itertools import chain
l = [[1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]]
new_l = [sorted(list(set(v).difference(chain.from_iterable(l[:num]))))
for num, v in enumerate(l)]
print(new_l)
# [[1, 2], [3], [4], [5]]