我有一份清单
[['Id', 'fname', 'lname', 'gender', 'startdate'],
['100', 'John', 'Jackson', 'M', '08/09/2000'],
['101', 'Jenny', 'Hobbs', 'F', '01/13/1995'],
['100', 'John', 'Jackson', 'M', '08/09/1995']]
我想删除重复列表,其中ID == ID AND StartDate<开始日期。 保留具有最新startdate的唯一ID的列表。
[['Id', 'fname', 'lname', 'gender', 'startdate'],
['100', 'John', 'Jackson', 'M', '08/09/2000'],
['101', 'Jenny', 'Hobbs', 'F', '01/13/1995']]
任何帮助都会很棒
答案 0 :(得分:4)
按日期顺序对行进行排序后,按行将字符串填入字典。你自己唯一要做的就是在使用之前删除标题。
import time
data = [['100', 'John', 'Jackson', 'M', '08/09/2000'],
['101', 'Jenny', 'Hobbs', 'F', '01/13/1995'],
['100', 'John', 'Jackson', 'M', '08/09/1995']]
data = sorted(data, key=lambda x:time.strptime(x[4], '%m/%d/%Y')) # sort data in ascending date order
keys = [x[0] for x in data]
print keys
d = dict(zip(keys,data)) # add to dictionary ... most recent values overwrite older ones
print d.values()
生成输出:
[['100', 'John', 'Jackson', 'M', '08/09/2000'], ['101', 'Jenny', 'Hobbs', 'F', '01/13/1995']]
答案 1 :(得分:1)
与@Maria Zverina相似,但更有条理:
import time
data = [
['100', 'John', 'Jackson', 'M', '08/09/2000'],
['101', 'Jenny', 'Hobbs', 'F', '01/13/1995'],
['100', 'John', 'Jackson', 'M', '08/09/1995']
]
# sort by date, ascending
data.sort(key=lambda d: time.strptime(d[4], "%m/%d/%Y"))
# load into a dict, key on ID, later data overwrites earlier
latest = dict((d[0], d) for d in data)
# return to list, sorted by ID
data = sorted(latest.itervalues(), key=lambda d: int(d[0]))
返回
# most recent data for each ID, sorted by ID:
[
['100', 'John', 'Jackson', 'M', '08/09/2000'],
['101', 'Jenny', 'Hobbs', 'F', '01/13/1995']
]
答案 2 :(得分:0)
这是另一种解决方案。我只是把钥匙放进一套,因为我发现它们。 orig
变量包含原始列表列表,res
是已删除重复列表的列表。
mod_set = set()
res = list()
for x in orig:
if x[0] not in mod_set:
res.append(x)
mod_set.add(x[0])
答案 3 :(得分:0)
这是一个可以做你想做的小脚本:
import time
mylist = [['100', 'John', 'Jackson', 'M', '08/09/2000'],
['101', 'Jenny', 'Hobbs', 'F', '01/13/1995'],
['100', 'John', 'Jackson', 'M', '08/09/1995']]
dict = {}
for sublist in mylist:
id,fname,lname,gender,startdate = sublist
if not id in dict:
dict[id] = [fname,lname,gender,startdate]
else:
olddate = dict[id][3]
if time.strptime(startdate,'%d/%m/%Y') > time.strptime(olddate,'%d/%m/%Y'):
dict[id] = [fname,lname,gender,startdate]
print dict
Output: {'100': ['John', 'Jackson', 'M', '08/09/2000'], '101': ['Jenny', 'Hobbs', 'F', '01/13/1995']}
最后dict
将包含指向最新记录的唯一ID。