Clojure:减少大型懒惰收集会占用内存

时间:2017-12-15 12:36:42

标签: performance memory clojure sequence lazy-evaluation

我是Clojure的新手。我有以下代码,它创建了一个无限懒惰的数字序列:

npm install

序列中的每个数字都取决于先前的计算。我使用tdd因为我需要所有中间结果。

然后我实例化两个生成器:

(defn generator [seed factor]
  (drop 1 (reductions 
            (fn [acc _] (mod (* acc factor) 2147483647))
            seed 
            ; using dummy infinite seq to keep the reductions going
            (repeat 1))))

然后我想比较这些序列的reductions连续结果,对于大n,并返回它们相等的次数。

起初我做了类似的事情:

(def gen-a (generator 59 16807))
(def gen-b (generator 393 48271))

这花了太长时间,我看到该程序的内存使用量飙升至约4GB。对于一些n s,我看到在大约1000万次迭代后它变得非常慢,所以我想可能(defn run [] (->> (interleave gen-a gen-b) (partition 2) (take 40000000) (filter #(apply = %)) (count))) 需要将整个序列存储在内存中,所以我将其更改为使用{{1 }}:

println

然而,在前几千万之后,它正在分配大量内存并显着放慢速度。我很确定它将整个懒惰序列存储在内存中,但我不确定原因,所以我试图手动丢掉头部:

count

同样,结果相同。这让我很难过,因为我读到我用来创建reduce序列的所有功能都是懒惰的,因为我只使用当前的项目I' m期望它使用恒定的记忆。

来自Python背景我基本上试图模仿Python Generators。我可能错过了一些明显的东西,所以我真的很感激一些指针。谢谢!

3 个答案:

答案 0 :(得分:6)

生成器不是(懒惰)序列。

你在这里坚持:

(def gen-a (generator 59 16807))
(def gen-b (generator 393 48271))

gen-agen-b是gobal vars,指的是序列的头部。

您可能需要以下内容:

(defn run []
  (->> (interleave (generator 59 16807) (generator 393 48271))
       (partition 2)
       (take 40000000)
       (filter #(apply = %))
       (count)))

或者,将gen-agen-b定义为函数:

(defn gen-a
  []
  (generator 59 16807)))
...

(defn run []
  (->> (interleave (gen-a) (gen-b))
       (partition 2)
       (take 40000000)
       (filter #(apply = %))
       (count)))

答案 1 :(得分:-1)

您可以直接构建一个惰性序列,而不是使用reductions。此答案使用lazy-cons from the Tupelo library(您也可以use lazy-seq from clojure.core)。

(ns tst.demo.core
  (:use tupelo.test)
  (:require
    [tupelo.core :as t]  ))

(defn rand-gen
  [seed factor]
  (let [next (mod (* seed factor) 2147483647)]
    (t/lazy-cons next (rand-gen next factor))))

(defn run2 [num-rand]
  (->> (interleave
         ; restrict to [0..99] to simulate bad rand #'s
         (map #(mod % 100) (rand-gen 59 16807))
         (map #(mod % 100) (rand-gen 393 48271)))
       (partition 2)
       (take num-rand)
       (filter #(apply = %))
       (count)))

(t/spyx (time (run2 1e5))) ; expect ~1% will overlap => 1e3
(t/spyx (time (run2 1e6))) ; expect ~1% will overlap => 1e4
(t/spyx (time (run2 1e7))) ; expect ~1% will overlap => 1e5

结果:

"Elapsed time:   90.42 msecs"  (time (run2   100000.0)) =>   1025 
"Elapsed time:  862.60 msecs"  (time (run2  1000000.0)) =>   9970
"Elapsed time: 8474.25 msecs"  (time (run2      1.0E7)) => 100068

请注意,执行时间大约快4倍,因为我们已经删除了我们还没有真正使用的生成器函数。

答案 2 :(得分:-2)

你可以在Clojure using the Tupelo library中获得Python风格的生成器函数。只需使用lazy-genyield,就像这样:

(ns tst.demo.core
  (:use tupelo.test)
  (:require
    [tupelo.core :as t]  ))

(defn rand-gen
  [seed factor]
  (t/lazy-gen
    (loop [acc seed]
      (let [next (mod (* acc factor) 2147483647)]
        (t/yield next)
        (recur next)))))

(defn run2 [num-rand]
  (->> (interleave
         ; restrict to [0..99] to simulate bad rand #'s
         (map #(mod % 100) (rand-gen 59 16807))
         (map #(mod % 100) (rand-gen 393 48271)))
       (partition 2)
       (take num-rand)
       (filter #(apply = %))
       (count)))

(t/spyx (time (run2 1e5))) ; expect ~1% will overlap => 1e3
(t/spyx (time (run2 1e6))) ; expect ~1% will overlap => 1e4
(t/spyx (time (run2 1e7))) ; expect ~1% will overlap => 1e5

结果:

"Elapsed time:   409.697922 msecs"   (time (run2 100000.0))   =>   1025
"Elapsed time:  3250.592798 msecs"   (time (run2 1000000.0))  =>   9970
"Elapsed time: 32995.194574 msecs"   (time (run2 1.0E7))      => 100068