重新组织和挑选嵌套列表的元素

时间:2012-08-08 10:29:24

标签: python list parsing

我有一份清单清单。每个嵌套列表包含4或5个元素(ID,日期,时间,名称,注释)。我希望能够提取每天第一次包含每个人的嵌套列表。目前我有:

NestedList = [[100, 08/08/2012, 8:00, John Smith], [100, 08/09/2012, 9:20, John Smith], [100, 08/08/2012, 10:00, John Smith], ..., [131, 08/10/2012, 8:00, Jane Williams], [131, 08/12/2012, 22:00, Jane Willams], ... (thousands of entries with hundreds of people)]

我希望有这样的东西:

NewList = [[100, 8/08/2012, 8:00, John Smith], [100, 8/09/2012, 8:02, John Smith], ...,      [131, 8/08/2012, 8:00, Jane Williams], [131, 08/09/2012, 8:05, Jane Williams], ...]

时钟设置为24小时而不是12.我已按ID号组织列表,然后按日期和时间组织,所以老实说,只需要每个人的第一个条目或ID号。如果这是非常基本的我很抱歉,但我找不到可能有用的东西。

1 个答案:

答案 0 :(得分:1)

听起来你想为每个日期名称对获得一个子列表。这似乎是字典的一个很好的用例:(日期,名称)是关键,该对的最早记录是值。

#uses an iterable `seq` to populate a dictionary.
#the function `keyFunc` will be called on each element of seq to generate keys.
#if two elements `a` and `b` have the same key, 
#`compFunc(a,b)` will return which element should belong in the dict.
def make_dict(seq, keyFunc, compFunc):
    d = {}
    for element in seq:
        key = keyFunc(element)
        if key not in d:
            d[key] = element
        else:
            d[key] = compFunc(d[key], element)
    return d

#I've put all your elements in quotes so that it's valid python. 
#You can use whatever types you prefer, 
#as long as the date and name can be used as a key, 
#and the time supports comparison.
NestedList = [
['100', '08/08/2012', '08:00', 'John Smith'], 
['100', '08/09/2012', '09:20', 'John Smith'], 
['100', '08/08/2012', '10:00', 'John Smith'], 
['131', '08/10/2012', '08:00', 'Jane Williams'], 
['131', '08/12/2012', '22:00', 'Jane Williams']
]

#the key is generated from the element's date and name
keyFunc = lambda x: (x[1], x[3])

#prefer the element with the smaller time
compFunc = lambda a,b: a if a[2] < b[2] else b

NewList = make_dict(NestedList, keyFunc, compFunc).values()
NewList.sort() #optional

print NewList

输出:

[
['100', '08/08/2012', '08:00', 'John Smith'], 
['100', '08/09/2012', '09:20', 'John Smith'], 
['131', '08/10/2012', '08:00', 'Jane Williams'], 
['131', '08/12/2012', '22:00', 'Jane Williams']
]