如何有效地检索飞轮扫描的非初始切片?

时间:2015-10-04 00:48:00

标签: python amazon-dynamodb

鉴于一个包含大量项目的大型Dynamo表,我希望能够从不相关的Python上下文开始扫描并稍后重新开始迭代,就像我继续调用next() gen()一样。 1}}在扫描本身。

我想避免的事情:

offset = 500
count = 25
scan_gen = engine.scan(AModel).gen()
for _ in range(offset):
   scan_gen.next()
results = [scan_gen.next() for _ in range(count)]

因为这需要每次都从顶部重新开始扫描。

我看到DynamoDB API通常以类似游标的方式使用LastEvaluatedKey属性:http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html

有没有办法用它在飞轮的扫描发生器中向前跳?

如果失败了,有没有办法序列化发电机的状态?我试过腌制生成器,由于名称解析问题导致pickle.PicklingError

>>> with open('/tmp/dump_dynamo_result', 'wb') as out_fp:
...  pickle.dump(engine.scan(AModel).gen(), out_fp, pickle.HIGHEST_PROTOCOL)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.7/pickle.py", line 1370, in dump
    Pickler(file, protocol).dump(obj)
  File "/usr/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.7/pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python2.7/pickle.py", line 396, in save_reduce
    save(cls)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 748, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <type 'generator'>: it's not found as __builtin__.generator

1 个答案:

答案 0 :(得分:0)

是的,您可以自己构建LastEvaulatedKey,并在另一次扫描中将其作为ExclusiveStartKey传递。它基本上是被评估的最后一项的关键(散列或散列/范围) - 即您要开始扫描的项目。

如果您想查看它的外观,请执行扫描并将限制设置为2或更小。您将获得返回的LastEvaluatedKey,您可以检查它并确定如何自己构建它。然后,从数据库中选择要开始扫描的项目,并从中创建LastEvaluatedKey。