Question

;; Suppose we want to compute the min and max of a collection.
;; Ideally there would be a way to tell Clojure that we want to perform
;; only one scan, which will theoretically save a little time  

;; First we define some data to test with
;; 10MM element lazy-seq
(def data (for [x (range 10000000)] (rand-int 100)))

;; Realize the lazy-seq 
(dorun data)

;; Here is the amount of time it takes to go through the data once
(time (apply min data))
==> "Elapsed time: 413.805 msecs"

;; Here is the time to calc min, max by explicitly scanning twice
(time (vector (apply min data) (apply max data)))
==> "Elapsed time: 836.239 msecs"

;; Shouldn't this be more efficient since it's going over the data once?
(time (apply (juxt min max) data))
==> "Elapsed time: 833.61 msecs"

Chuck，这是我使用你的解决方案后的结果：

test.core=> (def data (for [x (range 10000000)] (rand-int 100)))
#'test.core/data

test.core=> (dorun data)
nil

test.core=> (realized? data)
true

test.core=> (defn minmax1 [coll] (vector (apply min coll) (apply max coll)))    
#'test.core/minmax1

test.core=> (defn minmax2 [[x & xs]] (reduce (fn [[tiny big] n] [(min tiny n) (max big n)]) [x x] xs))    
#'test.core/minmax2

test.core=> (time (minmax1 data))
"Elapsed time: 806.161 msecs"
[0 99]

test.core=> (time (minmax2 data))
"Elapsed time: 6072.587 msecs"
[0 99]

Answer 1

这并不能准确回答您的一般问题（即如何扫描Clojure数据结构），但值得注意的是，这种代码通常更适合专用数据结构/图书馆，如果你真的关心性能。

e.g。使用core.matrix / vectorz-clj和一点点厚颜无耻的Java互操作：

;; define the raw data
(def data (for [x (range 10000000)] (rand-int 100)))

;; convert to a Vectorz array
(def v (array :vectorz data))

(time (Vectorz/minValue v))
"Elapsed time: 18.974904 msecs"
0.0

(time (Vectorz/maxValue v))
"Elapsed time: 21.310835 msecs"
99.0

即。这比问题中给出的原始代码快20-50倍。

我怀疑你会在任何依赖于扫描常规Clojure向量的代码上远离它，无论你是在一次传递还是其他方式。基本上 - 使用正确的工具。

Answer 2

juxt执行的代码几乎完全等同于您的手写版本 - ((juxt f g) x)字面意思是[(f x) (g x)]。它没有对集合进行任何巧妙的优化。

为了做你想做的事，我认为最简单的方法是对集合进行简单的折叠：

(defn minmax [[x & xs]]
  (reduce 
    (fn [[tiny big] n] [(min tiny n) (max big n)]) 
    [x x]
    xs))

在Clojure中，我可以优化扫描吗？

2 个答案: