在python中,我知道两个懒惰的“容器”:生成器和<class 'map'>
。
两者都不可订阅。因此map(f, data)[1]
和(f(x) for x in data)[1]
会失败。
python中是否有支持下标的延迟映射类?
如果没有最接近的匹配?
我一直在搜索functools
无效(或者我没有发现它)。
基本上我正在寻找这样的东西(但重新发明轮子应该是最后一个选择):
class lazilymappedlist:
def __init__ (self, f, lst):
self.f = f
self.data = [ (False, e) for e in lst]
def __getitem__ (self, idx):
status, elem = self.data [idx]
if status: return elem
elem = self.f (elem)
self.data [idx] = (True, elem)
return elem
答案 0 :(得分:1)
由于各种原因,这是一个棘手的事情。它可以很少的努力实现,假设输入数据,(时间)映射和访问模式的复杂性没有任何内容,但是随后首先使用生成器的一个或多个优点消失了。在问题中给出的示例代码中,不必跟踪所有值的优点就会丢失。
如果我们允许纯随机访问模式,那么至少必须缓存所有映射值,再次失去生成器的内存优势。
根据两个假设
问题中的示例代码应该没问题。这里有一些稍微不同的代码,它们也可以将生成器作为传入数据处理。它的优点是不会在对象构造上详尽地建立传入数据的副本。
因此,如果传入数据具有__getitem__
方法(索引),则使用self.elements
dict实现精简缓存包装器。如果访问稀疏,则字典比列表更有效。如果传入的数据没有索引,我们只需使用和存储待处理的传入数据以供以后映射。
代码:
class LazilyMapped:
def __init__(self, fn, iterable):
"""LazilyMapped lazily evaluates a mapping/transformation function on incoming values
Assumes mapping is expensive, and [idx] access is random and possibly sparse.
Still, this may defeat having a memory efficient generator on the incoming data,
because we must remember all read data for potential future access.
Lots of different optimizations could be done if there's more information on the
access pattern. For example, memory could be saved if we knew that the access idx
is monotonic increasing (then a list storage would be more efficient, because
forgetting data is then trivial), or if we'd knew that any index is only accessed
once (we could get rid of the infite cache), or if the access is not sparse, but
random we should be using a list instead of a dict for self.elements.
fn is a filter function, getting one element of iterable and returning a bool,
iterable may be a generator
"""
self.fn = fn
self.sparse_in = hasattr(iterable, '__getitem__')
if self.sparse_in:
self.original_values = iterable
else:
self.iter = iter(iterable)
self.original_idx = 0 # keep track of which index to do next in incoming data
self.original_values = [] # keep track of all incoming values
self.elements = {} # forever remember mapped data
def proceed_to(self, idx):
"""Consume incoming values and store for later mapping"""
if idx >= self.original_idx:
for _ in range(self.original_idx, idx + 1):
self.original_values.append(next(self.iter))
self.original_idx = idx + 1
def __getitem__(self, idx):
if idx not in self.elements:
if not self.sparse_in:
self.proceed_to(idx)
self.elements[idx] = mapped = self.fn(self.original_values[idx])
else:
mapped = self.elements[idx]
return mapped
if __name__ == '__main__':
test_list = [1,2,3,4,5]
dut = LazilyMapped(lambda v: v**2, test_list)
assert dut[0] == 1
assert dut[2] == 9
assert dut[1] == 4
dut = LazilyMapped(lambda v: v**2, (num for num in range(1, 7)))
assert dut[0] == 1
assert dut[2] == 9
assert dut[1] == 4
答案 1 :(得分:0)
我不知道标准python库中的任何这样的容器,因此我自己在library中实现它,模仿itertools的可索引对象:
import seqtools
def do(x):
print("-> computing now")
return x + 2
a = [1, 2, 3, 4]
m = seqtools.smap(do, a)
m = seqtools.add_cache(m, len(m))
# nothing printed because evaluation is delayed
m[0]
- &GT;现在计算
3