所以我在python中运行了一个查询并得到了一个结果列表,现在在这些结果中,同一个人可以有多个条目,例如:
[
["1", "someone", "cool", "RO", "AC", "SKST", "yes", "2/24/2017 0:00", "2/24/2017 10:51"],
["102", "another", "person", "RO", "AC", "SKST", "No", "1/26/2015 15:54", "1/26/2015 15:54"],
["102", "another", "person", "RO", "AC", "SKST", "NO", "6/29/2015 0:00", "6/29/2015 12:36"],
["102", "another", "person", "RO", "AC", "SKST", "yes", "8/31/2017 0:00", "8/31/2017 13:12"],
["62", "again", "someoneelse", "RO", "AC", "SKST", "No", "1/30/2017 0:00", "1/30/2017 13:49"],
etc...
]
因此,查看该数据,我们可以看到id为102的人有多个条目,我想过滤此列表,因此我们每个人只能获得一个条目并使用最后一个日期字段进行最新。
因此,对于人员ID#102,我们会删除所有其他条目,并保留最新日期:8/31/2017 13:12
我是python的新手,所以我不确定如何做到这一点,提前谢谢。
答案 0 :(得分:1)
您可以在Python3中使用itertools.groupby
和dateutils
:
import itertools
s = [
["1", "someone", "cool", "RO", "AC", "SKST", "yes", "2/24/2017 0:00", "2/24/2017 10:51"],
["102", "another", "person", "RO", "AC", "SKST", "No", "1/26/2015 15:54", "1/26/2015 15:54"],
["102", "another", "person", "RO", "AC", "SKST", "NO", "6/29/2015 0:00", "6/29/2015 12:36"],
["102", "another", "person", "RO", "AC", "SKST", "yes", "8/31/2017 0:00", "8/31/2017 13:12"],
["62", "again", "someoneelse", "RO", "AC", "SKST", "No", "1/30/2017 0:00", "1/30/2017 13:49"],
]
new_data = [(a, sorted([i[1:] for i in list(b)], key=lambda x:dateutil.parser.parse(x[-1]))) for a, b in itertools.groupby(sorted(s, key=lambda x:x[0]), key=lambda x:x[0])]
final_data = [[a]+b[-1] for a, b in new_data]
for i in final_data:
print(i)
输出:
['1', 'someone', 'cool', 'RO', 'AC', 'SKST', 'yes', '2/24/2017 0:00', '2/24/2017 10:51']
['102', 'another', 'person', 'RO', 'AC', 'SKST', 'yes', '8/31/2017 0:00', '8/31/2017 13:12']
['62', 'again', 'someoneelse', 'RO', 'AC', 'SKST', 'No', '1/30/2017 0:00', '1/30/2017 13:49']
答案 1 :(得分:0)
想要无忧无虑的代码并且易于掌握
m = [
["1", "someone", "cool", "RO", "AC", "SKST", "yes", "2/24/2017 0:00", "2/24/2017 10:51"],
["102", "another", "person", "RO", "AC", "SKST", "No", "1/26/2015 15:54", "1/26/2015 15:54"],
["102", "another", "person", "RO", "AC", "SKST", "NO", "6/29/2015 0:00", "6/29/2015 12:36"],
["102", "another", "person", "RO", "AC", "SKST", "yes", "8/31/2017 0:00", "8/31/2017 13:12"],
["62", "again", "someoneelse", "RO", "AC", "SKST", "No", "1/30/2017 0:00", "1/30/2017 13:49"],
]
from more_itertools import unique_everseen
list1 = sorted(m, key= lambda x:(x[0],x[8]),reverse = True)
out= [i for i in unique_everseen(list1,key= lambda x:x[0])]
答案 2 :(得分:0)
如果你想在不导入任何itertool模块的情况下开发自己的逻辑,那么你可以尝试纯python方式:
只是一个意见:
我添加了一个带有副本的项目,其中有两个日期用于测试用例:
data=[
["1", "someone", "cool", "RO", "AC", "SKST", "yes", "2/24/2017 0:00", "2/24/2017 10:51"],
["1", "someone", "cool", "RO", "AC", "SKST", "yes", "2/25/2017 0:00", "2/26/2017 10:51"],
["102", "another", "person", "RO", "AC", "SKST", "No", "1/26/2015 15:54", "1/26/2015 15:54"],
["102", "another", "person", "RO", "AC", "SKST", "NO", "6/29/2015 0:00", "6/29/2015 12:36"],
["102", "another", "person", "RO", "AC", "SKST", "yes", "8/31/2017 0:00", "8/31/2017 13:12"],
["62", "again", "someoneelse", "RO", "AC", "SKST", "No", "1/30/2017 0:00", "1/30/2017 13:49"]
]
from operator import itemgetter
track=[]
no_duplicate=[]
duplicate_dict={}
for index,value in enumerate(data):
if value[0] not in track:
track.append(value[0])
no_duplicate.append(value)
else:
if value[0] not in duplicate_dict:
duplicate_dict[value[0]]=[data[index]]
duplicate_dict[value[0]].extend([data[index-1]])
else:
duplicate_dict[value[0]].extend([data[index]])
duplicate_dict[value[0]].extend([data[index - 1]])
for index,value in enumerate(no_duplicate):
for item in [max(value,key=itemgetter(7))for key,value in duplicate_dict.items()]:
if item[0] in value:
no_duplicate[index]=item
print(no_duplicate)
输出:
[['1', 'someone', 'cool', 'RO', 'AC', 'SKST', 'yes', '2/25/2017 0:00', '2/26/2017 10:51'], ['102', 'another', 'person', 'RO', 'AC', 'SKST', 'yes', '8/31/2017 0:00', '8/31/2017 13:12'], ['62', 'again', 'someoneelse', 'RO', 'AC', 'SKST', 'No', '1/30/2017 0:00', '1/30/2017 13:49']]