Question

关于pmap函数的文档让我想知道如何通过网络获取XML提要集合的效率。我不知道pmap会产生多少并发获取操作以及最大值。

Answer 1

如果您查看来源，请看：

> (use 'clojure.repl)
> (source pmap)
(defn pmap
  "Like map, except f is applied in parallel. Semi-lazy in that the
  parallel computation stays ahead of the consumption, but doesn't
  realize the entire result unless required. Only useful for
  computationally intensive functions where the time of f dominates
  the coordination overhead."
  {:added "1.0"}
  ([f coll]
   (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
         rets (map #(future (f %)) coll)
         step (fn step [[x & xs :as vs] fs]
                (lazy-seq
                 (if-let [s (seq fs)]
                   (cons (deref x) (step xs (rest s)))
                   (map deref vs))))]
     (step rets (drop n rets))))
  ([f coll & colls]
   (let [step (fn step [cs]
                (lazy-seq
                 (let [ss (map seq cs)]
                   (when (every? identity ss)
                     (cons (map first ss) (step (map rest ss)))))))]
     (pmap #(apply f %) (step (cons coll colls))))))

(+ 2 (.. Runtime getRuntime availableProcessors))是一个很大的线索。 pmap将获取第一个(+ 2 processors)个工作，并通过future异步运行它们。因此，如果您有2个核心，它将一次启动4项工作，试图保持领先于您，但最大值应为2 + n。

future最终使用代理I / O线程池，它支持无限数量的线程。它会随着工作的增加而增长，如果线程未被使用则会缩小。

Answer 2

基于Alex的优秀答案，解释了pmap的工作原理，以下是我对你情况的建议：

(doall
  (map
    #(future (my-web-fetch-function %))
    list-of-xml-feeds-to-fetch))

理由：

您希望尽可能多地在飞行中完成工作，因为大多数工作都会阻止网络IO。
Future将为每个请求启动异步工作，在线程池中处理。你可以让Clojure聪明地处理它。
地图上的doall将强制评估完整序列（即所有请求的启动）。
您的主线程可以立即开始取消引用期货，因此可以在个别结果回来时继续取得进展

Answer 3

没有时间写一个长响应，但是有一个clojure.contrib http-agent，它将每个get / post请求创建为自己的代理。因此，您可以发出一千个请求，并且它们将全部并行运行并完成结果。

Answer 4

看看pmap的操作，似乎一次只有32个线程，无论你有多少个处理器，问题是地图将超前计算32并且期货是自己开始的。（样品） (defn samplef [n] (println "starting " n) (Thread/sleep 10000) n) (def result (pmap samplef (range 0 100)))

你将等待10秒，然后看到32张打印，然后当你拿33号另外32张 ;打印这个你一次做32个并发线程的分钟 ;对我来说，这并不完美 ; SALUDOS Felipe

Clojure的pmap函数为URL提取操作产生了多少个线程？

4 个答案: