Clojure - 更有效的'min-by'功能和性能分析

时间:2017-06-09 09:55:32

标签: clojure

我是Clojure的新手,2个月前我开始学习这门语言。我正在阅读“欢乐的clojure”一书,我在Functional programming主题中找到了一个 min-by 函数。我在想,我已经完成了我的 min-by 功能,这似乎至少比使用10.000项目提高了50%。这是函数

namespace  TypeScript.Controllers {

    export class BrowseMovieTicketsController {

        static $inject = ["$scope", "MovieInformationService"];

        constructor(private $scope: any, private MovieInformationService: Services.MovieInformationService) {
        }
    }
myApp.controller("BrowseMovieTicketsController", TypeScript.Controllers.BrowseMovieTicketsController);
}

终端输出

; the test vector with random data
(def my-rand-vec (vec (take 10000 (repeatedly #(rand-int 10000)))))

; the joy of clojure min-by
(defn min-by-reduce [f coll]
  (when (seq coll)
        (reduce (fn [min other]
                    (if (> (f min) (f other))
                        other
                        min))
                      coll)))

(time (min-by-reduce eval  my-rand-vec))

; my poor min-by 
(defn min-by-sort [f coll]
  (first (sort (map f coll))))

(time (min-by-sort eval my-rand-vec))

我的解决方案是否有任何性能或资源缺陷?我非常好奇clojure Gurus为这个功能提供更优雅的clojure解决方案。

修改

带有标准的更清晰的测试代码。

"Elapsed time: 91.657505 msecs"
"Elapsed time: 62.441513 msecs"

终端输出是:

(ns min-by.core
  (:gen-class))

(use 'criterium.core)

(defn min-by-reduce [f coll]
  (when (seq coll)
    (reduce (fn [min other]
                (if (> (f min) (f other))
                    other
                    min))
                  coll)))


(defn min-by-sort [f coll]
  (first (sort-by f coll)))

(defn my-rand-map [length]
  (map #(hash-map :resource %1 :priority %2) 
      (take length (repeatedly #(rand-int 200)))
      (take length (repeatedly #(rand-int 10)))))


(defn -main
  [& args]
  (let [rand-map (my-rand-map 100000)]
  (println "min-by-reduce-----------")
  (quick-bench (min-by-reduce :resource rand-map))
  (println "min-by-sort-------------")
  (quick-bench (min-by-sort :resource rand-map))
  (println "min-by-min-key----------")
  (quick-bench (apply min-key :resource rand-map)))
 )

4 个答案:

答案 0 :(得分:1)

首先,您的版本返回(f min)而不是min,并且从理论上来说,找到最小值是线性O(n)操作,而排序和取第一个是拟线性O(n log n)。对于小向量,可能难以获得准确的时序结果,并且就此而言,时间复杂性并不能保证拟线性运算总是比线性运算慢!

尝试使用1,000,000或更多的样本,并使用更多的Complext键功能。例如,生成示例字符串并使用length或类似的排序依据。这样你就可以得到更真实世界的结果。

提示:您可以使用eval代替identity而不是#34;跳过"为您的测试目的提供功能。不太可能影响基准测试,但只是因为你知道这个功能。

正如用户ClojureMostly指出的那样,eval是一个很大的瓶颈,并使基准偏向错误的结论。

答案 1 :(得分:0)

我相信JoC正试图说明cancerdf = pd.DataFrame(data={'Target':[1,0,1,'d', 'nan', np.nan]}) print (cancerdf) Target 0 1 1 0 2 1 3 d 4 nan 5 NaN status = {0:'Malignant', 1:'Benign'} cancerdf['Target'] = pd.to_numeric(cancerdf['Target'], errors='coerce') \ .fillna(2).astype(int).map(status) print (cancerdf) Target 0 Benign 1 Malignant 2 Benign 3 NaN 4 NaN 5 NaN 的使用,仅此而已。

我也读过JoC作为我的第一本书,但我希望我保存它直到我先读完了更多的入门书。那里有很多好的。您甚至可以在线阅读(大部分) Clojure for Brave和True http://www.braveclojure.com/clojure-for-the-brave-and-true/我还建议购买完整的硬拷贝版本。

您还应查看Clojure Cookbook:https://github.com/clojure-cookbook/clojure-cookbook和以前一样,我也建议您购买完整的硬拷贝版本。

答案 2 :(得分:0)

I've changed the min-by-sort function. The whole picture is changed.

(defn min-by-sort [f coll]
  (first (sort-by f coll)))

The terminal output for the 2 functions for 10.000 items is

"Elapsed time: 0.863016 msecs"
"Elapsed time: 11.44852 msecs"

So, the question, is there a better or more elegant min-by-xxx function, which finds the minimum in a collection according to a function and returns the original value?

And a final test with map and keyword like function.

; find (f min) by reduce
(defn min-by-reduce [f coll]
  (when (seq coll)
    (reduce (fn [min other]
                (if (> (f min) (f other))
                    other
                    min))
                  coll)))

; find (f min) by sort-by
(defn min-by-sort [f coll]
  (first (sort-by f coll)))

;a  helper function to build a sequence of {:resource x, :priority y} maps
(defn my-rand-map [length]
  (map #(hash-map :resource %1 :priority %2) 
      (take length (repeatedly #(rand-int 200)))
      (take length (repeatedly #(rand-int 10)))))

; test with 100 items in the seq
(let [rand-map (my-rand-map 100)]
 (time (min-by-reduce :resource rand-map))
 (time (min-by-sort :resource rand-map)))

Test 100 items
"Elapsed time: 0.245403 msecs" "Elapsed time: 0.18094 msecs"

Test 1000 items
"Elapsed time: 2.653952 msecs" "Elapsed time: 3.214373 msecs"

Test 10.000 items
"Elapsed time: 14.275679 msecs" "Elapsed time: 38.064996 msecs"

I think, the difference is sort-by of course order the items to, but reduce just walk through the items and accumulate the actual minimum. Is it true?

答案 3 :(得分:0)

标准min-key只需要稍微调整一下:

(defn min-by [f coll]
  (when (seq coll)
    (apply min-key f coll)))

如果你看the source code for min-key,这与JoC的min-by-reduce基本相同。