我有一个列表,我希望按多个key
排序,例如:
L = [ ... ]
L.sort(key = lambda x: ( f(x), g(x) ))
这很好用。但是,这会导致对g
的不必要的调用,我想避免这种调用(因为可能很慢)。换句话说,我想部分和懒惰地评估密钥。
例如,如果f
在L
上是唯一的(即len(L) == len(set(map(f,L)))
),则不应拨打g
。
最优雅/ pythonic的做法是什么?
我能想到的一种方法是定义一个自定义cmp
函数(L.sort(cmp=partial_cmp)
),但IMO不如使用key
参数那么优雅和复杂。
另一种方法是定义一个key-wrapper类,它使用生成器表达式生成键的不同部分,并覆盖比较运算符以逐个比较。但是,我觉得必须有一个更简单的方法...
编辑:我对多个函数排序的一般问题的解决方案感兴趣,不仅如上例所示,还有两个。
答案 0 :(得分:3)
您可以尝试使用itertools.groupby
:
result = []
for groupKey, group in groupby(sorted(L, key=f), key=f):
sublist = [y for y in group]
if len(sublist) > 1:
result += sorted(sublist, key=g)
else:
result += sublist
另一种可能性,即使不那么优雅,但到位:
L.sort(key = f)
start = None
end = None
for i,x in enumerate(L):
if start == None:
start = i
elif f(x) == f(L[start]):
end = i
elif end == None:
start = i
else:
L[start:end+1] = sorted(L[start:end+1], key=g)
start = None
if start != None and end != None:
L[start:end+1] = sorted(L[start:end+1], key=g)
第一个版本推广到任意数量的函数:
def sortBy(l, keyChain):
if not keyChain:
return l
result = []
f = keyChain[0]
for groupKey, group in groupby(sorted(l, key=f), key=f):
sublist = [y for y in group]
if len(sublist) > 1:
result += sortBy(sublist, keyChain[1:])
else:
result += sublist
return result
第二个版本推广到任意数量的函数(虽然没有完全到位):
def subSort(l, start, end, keyChain):
part = l[start:end+1]
sortBy(part, keyChain[1:])
l[start:end+1] = part
def sortBy(l, keyChain):
if not keyChain:
return
f = keyChain[0]
l.sort(key = f)
start = None
end = None
for i,x in enumerate(l):
if start == None:
start = i
elif f(x) == f(l[start]):
end = i
elif end == None:
start = i
else:
subSort(l, start, end, keyChain)
start = i
end = None
if start != None and end != None:
subSort(l, start, end, keyChain)
答案 1 :(得分:2)
给定一个函数,你可以像这样创建一个LazyComparer类:
def lazy_func(func):
class LazyComparer(object):
def __init__(self, x):
self.x = x
def __lt__(self, other):
return func(self.x) < func(other.x)
def __eq__(self, other):
return func(self.x) == func(other.x)
return lambda x: LazyComparer(x)
要从多个函数中创建一个惰性键函数,可以创建一个实用函数:
def make_lazy(*funcs):
def wrapper(x):
return [lazy_func(f)(x) for f in funcs]
return wrapper
他们可以像这样使用:
def countcalls(f):
"Decorator that makes the function count calls to it."
def _f(*args, **kwargs):
_f._count += 1
return f(*args, **kwargs)
_f._count = 0
return _f
@countcalls
def g(x): return x
@countcalls
def f1(x): return 0
@countcalls
def f2(x): return x
def report_calls(*funcs):
print(' | '.join(['{} calls to {}'.format(f._count, f.func_name)
for f in funcs]))
L = range(10)[::-1]
L.sort(key=make_lazy(f1, g))
report_calls(f1, g)
g._count = 0
L.sort(key=make_lazy(f2, g))
report_calls(f2, g)
产生
18 calls to f1 | 36 calls to g
36 calls to f2 | 0 calls to g
上面的@countcalls装饰用于确认当f1
返回很多时
关系,g
被称为打破关系,但当f2
返回不同的值时,
g
没有被调用。
NPE的解决方案在Key
类中添加了memoization。通过上面的解决方案,
你可以在LazyComparer
类之外添加memoization(独立于):
def memo(f):
# Author: Peter Norvig
"""Decorator that caches the return value for each call to f(args).
Then when called again with same args, we can just look it up."""
cache = {}
def _f(*args):
try:
return cache[args]
except KeyError:
cache[args] = result = f(*args)
return result
except TypeError:
# some element of args can't be a dict key
return f(*args)
_f.cache = cache
return _f
L.sort(key=make_lazy(memo(f1), memo(g)))
report_calls(f1, g)
会减少对g
的调用:
10 calls to f1 | 10 calls to g
答案 2 :(得分:1)
您可以使用懒惰评估和缓存g(x)
的关键对象:
class Key(object):
def __init__(self, obj):
self.obj = obj
self.f = f(obj)
@property
def g(self):
if not hasattr(self, "_g"):
self._g = g(self.obj)
return self._g
def __cmp__(self, rhs):
return cmp(self.f, rhs.f) or cmp(self.g, rhs.g)
以下是使用示例:
def f(x):
f.count += 1
return x // 2
f.count = 0
def g(x):
g.count += 1
return x
g.count = 0
L = [1, 10, 2, 33, 45, 90, 3, 6, 1000, 1]
print sorted(L, key=Key)
print f.count, g.count