鉴于一个包含大量项目的大型Dynamo表,我希望能够从不相关的Python上下文开始扫描并稍后重新开始迭代,就像我继续调用next()
gen()
一样。 1}}在扫描本身。
我想避免的事情:
offset = 500
count = 25
scan_gen = engine.scan(AModel).gen()
for _ in range(offset):
scan_gen.next()
results = [scan_gen.next() for _ in range(count)]
因为这需要每次都从顶部重新开始扫描。
我看到DynamoDB API通常以类似游标的方式使用LastEvaluatedKey
属性:http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html
有没有办法用它在飞轮的扫描发生器中向前跳?
如果失败了,有没有办法序列化发电机的状态?我试过腌制生成器,由于名称解析问题导致pickle.PicklingError
:
>>> with open('/tmp/dump_dynamo_result', 'wb') as out_fp:
... pickle.dump(engine.scan(AModel).gen(), out_fp, pickle.HIGHEST_PROTOCOL)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python2.7/pickle.py", line 1370, in dump
Pickler(file, protocol).dump(obj)
File "/usr/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python2.7/pickle.py", line 396, in save_reduce
save(cls)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 748, in save_global
(obj, module, name))
pickle.PicklingError: Can't pickle <type 'generator'>: it's not found as __builtin__.generator
答案 0 :(得分:0)
是的,您可以自己构建LastEvaulatedKey,并在另一次扫描中将其作为ExclusiveStartKey传递。它基本上是被评估的最后一项的关键(散列或散列/范围) - 即您要开始扫描的项目。
如果您想查看它的外观,请执行扫描并将限制设置为2或更小。您将获得返回的LastEvaluatedKey,您可以检查它并确定如何自己构建它。然后,从数据库中选择要开始扫描的项目,并从中创建LastEvaluatedKey。