我正在努力争取法拉第咒语来扫描整个DDB表。以下函数生成输出,但返回的结果远远少于表中我知道的18M记录。
(far/scan
common/client-opts
v2-index/layer-table-name
{:return #{:layer-key :range-key}})
=>
[{:range-key "soil&2015-07-22T15:13:09.101Z&ssurgo&v1", :layer-key "886985&886985"}
{:range-key "soil&2015-07-29T19:20:09.973Z&ssurgo&v1", :layer-key "886985&886985"}
...
{:range-key "veg&2014-05-29T16:16:31.000Z&true-color&v1", :layer-key "1674603&1674603"}
{:range-key "veg&2014-06-14T16:16:39.000Z&abs&v1", :layer-key "1674603&1674603"}]
我可以做些什么让法拉第处理所有记录?源代码表明有一些:last-prim-kvs
选项,但我不清楚它会在那里发生什么?此DDB表上的主键是由:layer-key
和:range-key
组成的复合主键。
答案 0 :(得分:1)
如果它适合记忆,这可行......
整个方案的关键是使用:limit 99
映射以及一些:span-reqs {:max 1}
映射来设置opts映射。 :span-reqs
映射对我来说是完全模糊的,但它似乎是概念上“页面大小”背后的真正驱动因素。我已经设置了一个10元素的表,如...
;; This only works on the whole table because the table is small!!!!
(far/scan
common/client-opts
"users.robert.kuhar.wtf_far"
{:return #{:part_key :sort_key :note}})
=>
[{:part_key "456", :sort_key "fha.abs", :note "\"456\",\"fha.abs\" created 2016-12-08T21:32:20.789Z."}
{:part_key "456", :sort_key "fha.rank", :note "\"456\",\"fha.rank\" created 2016-12-08T21:32:20.789Z."}
{:part_key "456", :sort_key "fha.raw", :note "\"456\",\"fha.raw\" created 2016-12-08T21:32:20.789Z."}
{:part_key "456", :sort_key "fha.true-color", :note "\"456\",\"fha.true-color\" created 2016-12-08T21:32:20.789Z."}
{:part_key "456", :sort_key "soil.ssurgo", :note "\"456\",\"soil.ssurgo\" created 2016-12-08T21:32:20.789Z."}
{:part_key "123", :sort_key "fha.abs", :note "\"123\",\"fha.abs\" created 2016-12-08T21:24:30.139Z."}
{:part_key "123", :sort_key "fha.rank", :note "\"123\",\"fha.rank\" created 2016-12-08T21:24:30.139Z"}
{:part_key "123", :sort_key "fha.raw", :note "\"123\",\"fha.raw\" created 2016-12-08T21:24:30.139Z."}
{:part_key "123", :sort_key "fha.true-color", :note "\"123\",\"fha.true-color\" created 2016-12-08T21:24:30.139Z."}
{:part_key "123", :sort_key "soil.ssurgo", :note "\"123\",\"soil.ssurgo\" created 2016-12-08T21:24:30.139Z."}]
如果我想一次浏览这4个元素,那么初始调用是......
(far/scan
common/client-opts
"users.robert.kuhar.wtf_far"
{:return #{:part_key :sort_key :note}
:limit 4
:span-reqs {:max 1}})
=>
[{:part_key "456", :sort_key "fha.abs", :note "\"456\",\"fha.abs\" created 2016-12-08T21:32:20.789Z."}
{:part_key "456", :sort_key "fha.rank", :note "\"456\",\"fha.rank\" created 2016-12-08T21:32:20.789Z."}
{:part_key "456", :sort_key "fha.raw", :note "\"456\",\"fha.raw\" created 2016-12-08T21:32:20.789Z."}
{:part_key "456", :sort_key "fha.true-color", :note "\"456\",\"fha.true-color\" created 2016-12-08T21:32:20.789Z."}]
所有后续调用都需要将:last-prim-kvs {:part_key "xxx" :sort_key "yyy"}
设置到该选项映射中,以告知法拉第在哪里拾取。对于第2页,电话就像......
(far/scan
common/client-opts
"users.robert.kuhar.wtf_far"
{:return #{:part_key :sort_key :note}
:limit 4
:span-reqs {:max 1}
:last-prim-kvs {:part_key "456" :sort_key "fha.true-color"}})
=>
[{:part_key "456", :sort_key "soil.ssurgo", :note "\"456\",\"soil.ssurgo\" created 2016-12-08T21:32:20.789Z."}
{:part_key "123", :sort_key "fha.abs", :note "\"123\",\"fha.abs\" created 2016-12-08T21:24:30.139Z."}
{:part_key "123", :sort_key "fha.rank", :note "\"123\",\"fha.rank\" created 2016-12-08T21:24:30.139Z"}
{:part_key "123", :sort_key "fha.raw", :note "\"123\",\"fha.raw\" created 2016-12-08T21:24:30.139Z."}]
我的10元素表的最后一页是......
(far/scan
common/client-opts
"users.robert.kuhar.wtf_far"
{:return #{:part_key :sort_key :note}
:limit 4
:span-reqs {:max 1}
:last-prim-kvs {:part_key "123" :sort_key "fha.raw"}})
=>
[{:part_key "123", :sort_key "fha.true-color", :note "\"123\",\"fha.true-color\" created 2016-12-08T21:24:30.139Z."}
{:part_key "123", :sort_key "soil.ssurgo", :note "\"123\",\"soil.ssurgo\" created 2016-12-08T21:24:30.139Z."}]
即使我要求4个元素,也只需2个元素。尝试远远超出扫描范围总是空的。
(far/scan
common/client-opts
"users.robert.kuhar.wtf_far"
{:return #{:part_key :sort_key :note}
:limit 4
:span-reqs {:max 1}
:last-prim-kvs {:part_key "123" :sort_key "soil.ssurgo"}})
=> []
所以这是端到端的,只要一切都适合内存。
(loop [accum []
page (far/scan
client-opts
"users.robert.kuhar.wtf_far"
{:limit 4
:span-reqs {:max 1}})]
(if (empty? page)
accum
(let [last-on-page (last page)
last-part-key (:part_key last-on-page)
last-sort-key (:sort_key last-on-page)]
(recur
(into accum page)
(far/scan
client-opts
"users.robert.kuhar.wtf_far"
{:limit 4
:span-reqs {:max 1}
:last-prim-kvs {:part_key last-part-key :sort_key last-sort-key}})))))
=>
[{:part_key "456", :sort_key "fha.abs", :note "\"456\",\"fha.abs\" created 2016-12-08T21:32:20.789Z."}
...
{:part_key "123", :sort_key "soil.ssurgo", :note "\"123\",\"soil.ssurgo\" created 2016-12-08T21:24:30.139Z."}]
对于“我如何才能获得法拉第/扫描以遍历整个DynamoDB表?”的情况,我认为这是一个悲伤的最终答案。是不是。你需要手工构建它。