Question

我正在尝试使用筛选算法来总结Euler项目问题的素数。我正在使用一个可变集来存储非素数的数字，并使用'dosync'和'commute'来更新集合（否则如果它是不可变的话我会耗尽内存）。性能大致线性上升至120万个素数，但性能可怕，为150万（7秒对64秒）。我有什么想法我做错了吗？我的猜测是，数字可能过大，或者更新可变集合效率低下。

(defn mark-multiples [not-prime-set multiple prime-max]
  (loop [ counter (long 2) 
      product (* counter multiple)]
    (if (> product prime-max) not-prime-set
      (do
        (dosync (commute not-prime-set conj product))
        (recur  (inc counter) (* (inc counter) multiple))))))


(defn sieve-summation [prime-max]
  (def not-prime-set (ref #{ (long 1) }) )
  (loop [counter (long 2)
     summation (long 0)]
    (if (> counter prime-max) summation
      (if (not (contains? @not-prime-set counter)) 
        (do 
          (mark-multiples not-prime-set counter prime-max)
          (recur  (inc counter) (+ summation counter)))
        (recur (inc counter) summation)))))

=＆GT; （时间（筛子总和100000）） “经过的时间：496.673毫秒” 454396537

=＆GT; （时间（筛子总和150000）） “经过的时间：763.333 msecs” 986017447

=＆GT; （时间（筛子总和1000000）） “经过的时间：6037.926毫秒” 37550402023

=＆GT; （时间（筛子总和1100000）） “经过的时间：6904.385毫秒” 45125753695

=＆GT; （时间（筛子总和1200000）） “经过的时间：7321.299 msecs” 53433406131

=＆GT; （时间（筛子总和1500000）） “经过的时间：64995.216毫秒” 82074443256

---- ----编辑

谢谢A.韦伯，非常好的建议！你的代码有点慢，所以为了让它加速，我必须在一开始就制作非素数设置瞬态，现在它运行得更快（大约8次）。我仍然会出现内存不足错误，因此我将尝试弄清楚如何增加jvm上的堆大小以查看是否可以修复它。我在Mac上运行Eclipse上的Clojure，而且我是Clojure和Macs的新手。

我很想知道如何进一步重构程序（保持大部分相同的逻辑）在Clojure中更优雅。再次感谢。

(defn mark-multiples2 [not-prime-set prime prime-max]
  (loop [multiple (* 2 prime) nps not-prime-set ]
    (if (> multiple prime-max) 
      nps
      (recur (+ multiple prime) (conj! nps multiple)))))


(defn sieve-summation2 [prime-max]
  (loop [counter 2, summation 0, not-prime-set (transient #{1})]
    (if (> counter prime-max) 
      summation
      (if (not-prime-set counter) 
        (recur (inc counter) summation not-prime-set)
        (recur (inc counter) 
           (+ summation counter) 
           (mark-multiples2 not-prime-set counter prime-max))))))

=＆GT; （时间（筛子总和2 100000）） “经过的时间：124.781毫秒” 454396537

=＆GT; （时间（筛子总和100000）） “经过的时间：876.744毫秒” 454396537

Answer 1

在Clojure中有更好，更优雅的方法来解决这个问题，但这不是你问题的重点。

使用引用类型 - 无论是ref还是更恰当的原子 - 在这里对你没有任何帮助。你仍在创造同样多的垃圾。您只是将可变存储位置的内容从一个不可变数据结构交换到另一个。我不知道是什么导致你的时间飙升，但有一种可能性是你触发了一个漫长的垃圾收集周期。

您想在这里使用的是transients。在不改变代码的情况下，以下内容应该是一个显着的加速。

(defn mark-multiples [not-prime-set multiple prime-max]
  (loop [m (* 2 multiple), nps (transient not-prime-set)]
    (if (> m prime-max) 
      (persistent! nps)
      (recur (+ m multiple) (conj! nps m)))))


(defn sieve-summation [prime-max]
  (loop [counter 2, summation 0, not-prime-set #{1}]
    (if (> counter prime-max) 
      summation
      (if (contains? not-prime-set counter) 
        (recur (inc counter) summation not-prime-set)
        (recur (inc counter) 
               (+ summation counter) 
               (mark-multiples not-prime-set counter prime-max))))))

这是相同的算法，更惯用的风格：

(defn mark [s n m]
  (into s (range (* 2 n) m n)))

(defn prime-sum [m]
  (let [step (fn [[a s] n] 
               (if (s n)
                 [a s] 
                 [(+ a n) (mark s n m)]))]
  (first (reduce step [0 #{}] (range 2 m)))))

从这里开始，您可能会开始攻击算法的固有内存问题 - 您正在存储所有非素数，而您只需要在任何给定点存储下一个非素数。要想实现这个想法，请参阅Christophe Grand的Everybody loves the Sieve of Eratosthenes条目。

Answer 2

我使用了Eclipse的逆时针插件，它使用的是Clojure 1.5，我相信因为我有jdk 1.6。我升级到jdk 1.7，并更新了project.clj以使用Clojure 1.6.0并且没有任何内存/速度问题是一个问题。感谢您的建议。

设置更新或大量计算的Clojure性能

2 个答案: