例如,我将以下数据作为列表:
l = [['A', 'aa', '1', '300'],
['A', 'ab', '2', '30'],
['A', 'ac', '3', '60'],
['B', 'ba', '5', '50'],
['B', 'bb', '4', '10'],
['C', 'ca', '6', '50']]
现在'A'
,'B'
和'C'
,我想最后一次出现,即:
[['A', 'ab', '3', '30'],
['B', 'bb', '4', '10'],
['C', 'ca', '6', '50']]
或者更进一步,这些事件中的第三列,即:
['3', '4', '6']
目前,我处理此问题的方式是:
import pandas as pd
df = pd.DataFrame(l, columns=['u', 'w', 'y', 'z'])
df.set_index('u', inplace=True)
ll = []
for letter in df.index.unique():
ll.append((df.ix[letter, 'y'][-1]))
然后我%timeit
,它显示:
>> The slowest run took 27.86 times longer than the fastest.
>> This could mean that an intermediate result is being cached.
>> 1000000 loops, best of 3: 887 ns per loop
只是想知道是否有办法用比我的代码更少的时间来做到这一点?谢谢!
答案 0 :(得分:2)
l = [['A', 'aa', '1', '300'],
['A', 'ab', '2', '30'],
['A', 'ac', '3', '60'],
['B', 'ba', '5', '50'],
['B', 'bb', '4', '10'],
['C', 'ca', '6', '50']]
import itertools
for key, group in itertools.groupby(l, lambda x: x[0]):
print key, list(group)[-1]
没有评论"效率"因为你根本没有解释过你的情况。假设列表是按照子列表的第一个元素预先排序的。
如果列表已排序,则一次运行就足够了:
def tidy(l):
tmp = []
prev_row = l[0]
for row in l:
if row[0] != prev_row[0]:
tmp.append(prev_row)
prev_row = row
tmp.append(prev_row)
return tmp
这比在timeit测试中的itertools.groupby快〜5倍。示范:https://repl.it/C5Af/0
[编辑:OP更新了他们的问题,说他们已经将Pandas用于groupby,这可能已经更快了]
答案 1 :(得分:1)
即使我不确定我理解你的问题,但这是你可以做的:
li = [l[i][0] for i in range(len(l))]
[l[j][2] for j in [''.join(li).rfind(i) for i in set(li)]]
请注意,输出为[3,4,6]
,因为A
的最后一次出现似乎是第三次,而不是第二次出现。
编辑,因为你似乎非常关心表现(虽然你没有说你尝试过什么,什么是“好”):
%timeit li = [l[i][0] for i in range(len(l))]
%timeit [l[j][2] for j in [''.join(li).rfind(i) for i in set(li)]]
>> 1000000 loops, best of 3: 1.19 µs per loop
>> 100000 loops, best of 3: 2.57 µs per loop
%timeit [list(group)[-1][2] for key, group in itertools.groupby(l, lambda x: x[0])]
>> 100000 loops, best of 3: 5.11 µs per loop
所以看起来列表理解比itertools略快(虽然我不是基准测试专家,但可能有更好的方法来运行itertools)。
答案 2 :(得分:1)
{l[0]: l[2] for l in vals}
会为您提供“A”,“B”和“C”到其上一个值的映射
答案 3 :(得分:0)
一种不是非常pythonic的方法:(注意Nils的解决方案是最pythonic - 使用列表理解)
def get_last_row(xs,q):
for i in range(len(xs)-1,-1,-1):
if xs[i][0] == q:
return xs[i][2]
def get_third_cols(xs):
third_cols = []
for q in ["A","B","C"]:
third_cols.append(get_last_row(xs,q))
return third_cols
print get_third_cols(xs)
这会打印['3', '4', '6']
,如果这是您最后一次出现的意思。
答案 4 :(得分:0)
这将推广到任何键/值位置。请注意,输出将按照观察第一个键的顺序在中。不会很难调整,因此输出的顺序是观察输出值的顺序
import operator
l = [['A', 'aa', '1', '300'],
['A', 'ab', '2', '30'],
['A', 'ac', '3', '60'],
['B', 'ba', '5', '50'],
['B', 'bb', '4', '10'],
['C', 'ca', '6', '50']]
def getLast(data, key, value):
f = operator.itemgetter(key,value)
store = dict()
keys = []
for row in data:
key, value = f(row)
if key not in store:
keys.append(key)
store[key] = value
return [store[k] for k in keys]
现在计时,
%timeit getLast(l,0,2)
给出:
The slowest run took 9.44 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.85 µs per loop
功能输出:
['3', '4', '6']