Python - 如何从列表中提取满足特定条件的最后一个匹配项

时间:2016-06-22 15:25:28

标签: python list data-manipulation

例如,我将以下数据作为列表:

l = [['A', 'aa', '1', '300'],
     ['A', 'ab', '2', '30'],
     ['A', 'ac', '3', '60'],
     ['B', 'ba', '5', '50'],
     ['B', 'bb', '4', '10'],
     ['C', 'ca', '6', '50']]

现在'A''B''C',我想最后一次出现,即:

[['A', 'ab', '3', '30'],
 ['B', 'bb', '4', '10'],
 ['C', 'ca', '6', '50']]

或者更进一步,这些事件中的第三列,即:

['3', '4', '6']

目前,我处理此问题的方式是:

import pandas as pd
df = pd.DataFrame(l, columns=['u', 'w', 'y', 'z'])
df.set_index('u', inplace=True)
ll = []
for letter in df.index.unique():
    ll.append((df.ix[letter, 'y'][-1]))

然后我%timeit,它显示:

>> The slowest run took 27.86 times longer than the fastest. 
>> This could mean that an intermediate result is being cached.
>> 1000000 loops, best of 3: 887 ns per loop

只是想知道是否有办法用比我的代码更少的时间来做到这一点?谢谢!

5 个答案:

答案 0 :(得分:2)

l =  [['A', 'aa', '1', '300'],
  ['A', 'ab', '2', '30'],
  ['A', 'ac', '3', '60'],
  ['B', 'ba', '5', '50'],
  ['B', 'bb', '4', '10'],
  ['C', 'ca', '6', '50']]

import itertools
for key, group in itertools.groupby(l, lambda x: x[0]):
    print key, list(group)[-1]

没有评论"效率"因为你根本没有解释过你的情况。假设列表是按照子列表的第一个元素预先排序的。

如果列表已排序,则一次运行就足够了:

def tidy(l):
    tmp = []
    prev_row = l[0]

    for row in l:
        if row[0] != prev_row[0]:
            tmp.append(prev_row)
        prev_row = row
    tmp.append(prev_row)
    return tmp

这比在timeit测试中的itertools.groupby快〜5倍。示范:https://repl.it/C5Af/0

[编辑:OP更新了他们的问题,说他们已经将Pandas用于groupby,这可能已经更快了]

答案 1 :(得分:1)

即使我不确定我理解你的问题,但这是你可以做的:

li = [l[i][0] for i in range(len(l))]
[l[j][2] for j in [''.join(li).rfind(i) for i in set(li)]]

请注意,输出为[3,4,6],因为A的最后一次出现似乎是第三次,而不是第二次出现。

编辑,因为你似乎非常关心表现(虽然你没有说你尝试过什么,什么是“好”):

%timeit li = [l[i][0] for i in range(len(l))]
%timeit [l[j][2] for j in [''.join(li).rfind(i) for i in set(li)]]
>> 1000000 loops, best of 3: 1.19 µs per loop
>> 100000 loops, best of 3: 2.57 µs per loop

%timeit [list(group)[-1][2] for key, group in itertools.groupby(l, lambda x: x[0])]
>> 100000 loops, best of 3: 5.11 µs per loop

所以看起来列表理解比itertools略快(虽然我不是基准测试专家,但可能有更好的方法来运行itertools)。

答案 2 :(得分:1)

{l[0]: l[2] for l in vals}会为您提供“A”,“B”和“C”到其上一个值的映射

答案 3 :(得分:0)

一种不是非常pythonic的方法:(注意Nils的解决方案是最pythonic - 使用列表理解)

def get_last_row(xs,q):
    for i in range(len(xs)-1,-1,-1):
        if xs[i][0] == q:
            return xs[i][2]

def get_third_cols(xs):
    third_cols = []
    for q in ["A","B","C"]:
        third_cols.append(get_last_row(xs,q))
    return third_cols

print get_third_cols(xs)

这会打印['3', '4', '6'],如果这是您最后一次出现的意思。

答案 4 :(得分:0)

这将推广到任何键/值位置。请注意,输出将按照观察第一个键的顺序在。不会很难调整,因此输出的顺序是观察输出值的顺序

import operator

l = [['A', 'aa', '1', '300'],
  ['A', 'ab', '2', '30'],
  ['A', 'ac', '3', '60'],
  ['B', 'ba', '5', '50'],
  ['B', 'bb', '4', '10'],
  ['C', 'ca', '6', '50']]

def getLast(data, key, value):
    f = operator.itemgetter(key,value)
    store = dict()
    keys = []
    for row in data:
        key, value = f(row)
        if key not in store:
            keys.append(key)
        store[key] = value
    return [store[k] for k in keys]

现在计时,

%timeit getLast(l,0,2)

给出:

The slowest run took 9.44 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 2.85 µs per loop

功能输出:

['3', '4', '6']