如何让法拉第/扫描遍历整个DynamoDB表?

时间:2016-11-17 00:26:43

标签: clojure

我正在努力争取法拉第咒语来扫描整个DDB表。以下函数生成输出,但返回的结果远远少于表中我知道的18M记录。

(far/scan 
  common/client-opts 
  v2-index/layer-table-name
  {:return #{:layer-key :range-key}})
=>
[{:range-key "soil&2015-07-22T15:13:09.101Z&ssurgo&v1", :layer-key "886985&886985"}
 {:range-key "soil&2015-07-29T19:20:09.973Z&ssurgo&v1", :layer-key "886985&886985"}
  ...
 {:range-key "veg&2014-05-29T16:16:31.000Z&true-color&v1", :layer-key "1674603&1674603"}
 {:range-key "veg&2014-06-14T16:16:39.000Z&abs&v1", :layer-key "1674603&1674603"}]

我可以做些什么让法拉第处理所有记录?源代码表明有一些:last-prim-kvs选项,但我不清楚它会在那里发生什么?此DDB表上的主键是由:layer-key:range-key组成的复合主键。

1 个答案:

答案 0 :(得分:1)

如果它适合记忆,这可行......

整个方案的关键是使用:limit 99映射以及一些:span-reqs {:max 1}映射来设置opts映射。 :span-reqs映射对我来说是完全模糊的,但它似乎是概念上“页面大小”背后的真正驱动因素。我已经设置了一个10元素的表,如...

;; This only works on the whole table because the table is small!!!!
(far/scan
  common/client-opts
  "users.robert.kuhar.wtf_far"
  {:return #{:part_key :sort_key :note}})
=>
[{:part_key "456", :sort_key "fha.abs", :note "\"456\",\"fha.abs\" created 2016-12-08T21:32:20.789Z."}
 {:part_key "456", :sort_key "fha.rank", :note "\"456\",\"fha.rank\" created 2016-12-08T21:32:20.789Z."}
 {:part_key "456", :sort_key "fha.raw", :note "\"456\",\"fha.raw\" created 2016-12-08T21:32:20.789Z."}
 {:part_key "456", :sort_key "fha.true-color", :note "\"456\",\"fha.true-color\" created 2016-12-08T21:32:20.789Z."}
 {:part_key "456", :sort_key "soil.ssurgo", :note "\"456\",\"soil.ssurgo\" created 2016-12-08T21:32:20.789Z."}
 {:part_key "123", :sort_key "fha.abs", :note "\"123\",\"fha.abs\" created 2016-12-08T21:24:30.139Z."}
 {:part_key "123", :sort_key "fha.rank", :note "\"123\",\"fha.rank\" created 2016-12-08T21:24:30.139Z"}
 {:part_key "123", :sort_key "fha.raw", :note "\"123\",\"fha.raw\" created 2016-12-08T21:24:30.139Z."}
 {:part_key "123", :sort_key "fha.true-color", :note "\"123\",\"fha.true-color\" created 2016-12-08T21:24:30.139Z."}
 {:part_key "123", :sort_key "soil.ssurgo", :note "\"123\",\"soil.ssurgo\" created 2016-12-08T21:24:30.139Z."}]

如果我想一次浏览这4个元素,那么初始调用是......

(far/scan
  common/client-opts
  "users.robert.kuhar.wtf_far"
  {:return #{:part_key :sort_key :note}
   :limit 4
   :span-reqs {:max 1}})
=>
[{:part_key "456", :sort_key "fha.abs", :note "\"456\",\"fha.abs\" created 2016-12-08T21:32:20.789Z."}
 {:part_key "456", :sort_key "fha.rank", :note "\"456\",\"fha.rank\" created 2016-12-08T21:32:20.789Z."}
 {:part_key "456", :sort_key "fha.raw", :note "\"456\",\"fha.raw\" created 2016-12-08T21:32:20.789Z."}
 {:part_key "456", :sort_key "fha.true-color", :note "\"456\",\"fha.true-color\" created 2016-12-08T21:32:20.789Z."}]

所有后续调用都需要将:last-prim-kvs {:part_key "xxx" :sort_key "yyy"}设置到该选项映射中,以告知法拉第在哪里拾取。对于第2页,电话就像......

(far/scan
  common/client-opts
  "users.robert.kuhar.wtf_far"
  {:return #{:part_key :sort_key :note}
   :limit 4
   :span-reqs {:max 1}
   :last-prim-kvs {:part_key "456" :sort_key "fha.true-color"}})
=>
[{:part_key "456", :sort_key "soil.ssurgo", :note "\"456\",\"soil.ssurgo\" created 2016-12-08T21:32:20.789Z."}
 {:part_key "123", :sort_key "fha.abs", :note "\"123\",\"fha.abs\" created 2016-12-08T21:24:30.139Z."}
 {:part_key "123", :sort_key "fha.rank", :note "\"123\",\"fha.rank\" created 2016-12-08T21:24:30.139Z"}
 {:part_key "123", :sort_key "fha.raw", :note "\"123\",\"fha.raw\" created 2016-12-08T21:24:30.139Z."}]

我的10元素表的最后一页是......

(far/scan
  common/client-opts
  "users.robert.kuhar.wtf_far"
  {:return #{:part_key :sort_key :note}
   :limit 4
   :span-reqs {:max 1}
   :last-prim-kvs {:part_key "123" :sort_key "fha.raw"}})
=>
[{:part_key "123", :sort_key "fha.true-color", :note "\"123\",\"fha.true-color\" created 2016-12-08T21:24:30.139Z."}
 {:part_key "123", :sort_key "soil.ssurgo", :note "\"123\",\"soil.ssurgo\" created 2016-12-08T21:24:30.139Z."}]

即使我要求4个元素,也只需2个元素。尝试远远超出扫描范围总是空的。

(far/scan
  common/client-opts
  "users.robert.kuhar.wtf_far"
  {:return #{:part_key :sort_key :note}
   :limit 4
   :span-reqs {:max 1}
   :last-prim-kvs {:part_key "123" :sort_key "soil.ssurgo"}})
=> []

所以这是端到端的,只要一切都适合内存。

(loop [accum []
       page (far/scan 
              client-opts 
              "users.robert.kuhar.wtf_far" 
              {:limit 4 
               :span-reqs {:max 1}})]
  (if (empty? page)
    accum
    (let [last-on-page (last page)
          last-part-key (:part_key last-on-page)
          last-sort-key (:sort_key last-on-page)]
      (recur
        (into accum page)
        (far/scan
          client-opts
          "users.robert.kuhar.wtf_far"
          {:limit 4
           :span-reqs {:max 1}
           :last-prim-kvs {:part_key last-part-key :sort_key last-sort-key}})))))
=>
[{:part_key "456", :sort_key "fha.abs", :note "\"456\",\"fha.abs\" created 2016-12-08T21:32:20.789Z."}
 ...
 {:part_key "123", :sort_key "soil.ssurgo", :note "\"123\",\"soil.ssurgo\" created 2016-12-08T21:24:30.139Z."}]

对于“我如何才能获得法拉第/扫描以遍历整个DynamoDB表?”的情况,我认为这是一个悲伤的最终答案。是不是。你需要手工构建它。